This is part one of a two blog series. In this blog I talk about software packaging and conda. Head to part two to read about my work and experience during my internship at Anaconda.
Once again attempts were made to, once again, clarify that conda is not Anaconda's nickname. This time through tweets and memes. Today, let us understand, for once and for all (as if such an ideality were possible!) what conda is. Spoiler: It is an OS-agnostic package manager.
My name is Mahe. I am a senior year Computer Engineering student from Delhi, now a Software Engineer at Anaconda. I was very recently an intern here. During my internship I worked on a community-developed project called 'Grayskull'. In this blog post, I will talk about conda, conda-skeleton, and Grayskull. I will also discuss the disadvantages of tightly coupled projects/tools, and the advantages of embracing community innovations in open source ecosystems.
Concepts and Terminology
Before moving forward, let us quickly learn a few terms widely used in the Conda packaging ecosystem. A Software Package is simply a working piece of code that does something. Software packages are installable so that people can benefit from the code written by others. Channels are online locations where these packages live and can be downloaded from. Channels are warehouses of packages. Conda is an OS-agnostic package and environment manager for Python packages and data science adjacent libraries. It allows you to manage the environments and dependencies of your packages and generate the needed context for your project to run successfully on a variety of machines. Conda-build is a set of commands and tools that lets you build your own Conda packages. To create a package with conda-build, you need to provide a Recipe, minimally a meta.yaml file that contains the packaging metadata and build instructions for that specific package. You can learn more about conda recipes here.
Writing Recipes
For someone who is new to the packaging world, writing package recipes can seem quite intimidating. Even people who are not new to it would agree that writing package recipes is often boring and tiresome, not to mention highly error-prone. Example recipes and templates help, sure, but one would rather their life was made easier and their package recipe was generated automatically and was perfectly concise. There is Conda Skeleton, an automatic conda recipe generator provided with conda-build. Conda Skeleton is a helpful tool indeed. But it falls short of being the perfect recipe generator for several reasons:
- It is slow in generating recipes.
- It cannot be deployed on systems without conda.
- It has a huge number of dependencies.
- The recipes it generates are not always concise.
These shortcomings in conda-skeleton led to the development of Grayskull in the conda-forge community by Marcelo Trevisani.
Grayskull - The Community-Developed conda Recipe Generator
Grayskull is an automatic conda recipe generator that generates concise conda recipes for Python packages available on PyPI and GitHub. It significantly improves upon conda-skeleton in terms of speed, conciseness of the recipes, packaging environment specificity, and memory usage. Grayskull has proved to be an extremely useful tool for the packaging ecosystem by generating very accurate recipes very quickly.
Grayskull - An Improvement Upon conda-skeleton
Grayskull generates recipes that take into consideration the platform, Python version available, selectors, compilers (Fortran, C and C++), package constraints, license type, etc. It uses metadata available from multiple sources to create the best recipe possible.
The table below compares and contrasts the performance and mechanisms of Grayskull and conda-skeleton:
Grayskull | conda-skeleton |
---|---|
Detects when the recipe supports noarch:python | Does not detect noarch:python |
Always tries to detect compilers | Does not detect compilers |
Standalone application, can be deployed on systems without conda | Relies on conda |
Light weight due to reduced dependencies | Huge number of dependencies due to reliance on conda |
pip installable | Not pip installable |
Creates a small, temporary virtual env to stimulate the installation of the package using the source tarball | Creates a separate conda env and runs the solver, hence takes up a lot of time |
Generates concise recipes | Sometimes mixes up dependencies and generates unnecessarily bloated recipes |
Improving conda-skeleton is Tough
conda-skeleton the recipe generator is very tightly coupled with conda-build, the package builder. Due to this, it is very risky to try and change any functionality in conda-skeleton, as it could lead to breaking something in conda-build itself. Also worth noting is that the conda-skeleton code is not very modular and does not contain very many comments. This can make onboarding to conda-skeleton a difficult and time-consuming task.
Grayskull, on the other hand, is a standalone tool. The code is very interchangeable, which makes it easier to add new functionality or update existing functionality. Grayskull also has ample comments describing each function in the code, which makes it easier for new people to onboard and understand the codebase.
Embracing Community Innovation
Anaconda, the company behind conda and conda-skeleton, gracefully acknowledged the advantages of Grayskull over conda-skeleton and has been supporting Grayskull and is making efforts to adopt it as the de facto conda recipe generator. This also falls in line with the conda project efforts.
During my internship at Anaconda I worked on Grayskull, adding more package origins to it, and taking it a step further to being a versatile conda recipe generator. In the follow up blog I talk about my work during the internship and my experience at Anaconda.