Mahe's Internship Experience at Anaconda

In part one of this two blog series I wrote about conda and software packaging. I explained important terminology from the conda packaging ecosystem and discussed the features of the automatic recipe generator called Grayskull. In part two, I talk about my work during the internship at Anaconda, my experience here, and my learnings.

My Work During The Internship

Adding CRAN Support to Grayskull

Grayskull could generate recipes for Python packages available on PyPI and GitHub. Another useful package origin could be CRAN, given the popularity of the R language. Therefore during this internship, I worked on adding CRAN support to Grayskull. I studied the CRAN documentation and learnt how CRAN ships its packages and what all sources are available to extract the metadata for an R package. Through my research, I found that all R packages have a 'DESCRIPTION' file. This file contains metadata about the package. I began to map information in the DESCRIPTION file to the information in a conda recipe and I realized that a number of fields in the conda recipe for an R package and be directly populated from the information available in the DESCRIPTION file of that package. I was, therefore, able to generate R recipes through Grayskull. Of course, the DESCRIPTION file does not have all the information needed to write the entire recipe. Additional layers were added (and more are to be added) to fill in the missing information. Presently we are only able to support simple R packages for recipe generation, i.e. packages that do not need system-specific compilation. In the next iteration, we would try to also support complex packages, whose recipes must include compiler information. See here.

Detecting Percentage Match of Licenses in Generated Recipes

Every conda package is shipped with a license. There are some standard licenses such as MIT, Apache, BSD, etc that are widely used. Sometimes people add these licenses to their projects but make modifications to the license according to their needs. Conda recipes require license information of the package. It is therefore important that during the recipe generation process, when Grayskull detects the license of the package, the user is informed that the license has been modified and to what extent. The intent here is to detect and inform when subtle changes have been added to the original text (like one extra clause), making it a new license altogether. Grayskull uses the rapidfuzz Python library to fuzzy match the package license with a list of standard licenses. I used the 'fuzz' module of this library to calculate the percentage match of the license and then display a warning to the user. This lets the user know if their included license deviates significantly from the standard version of the license.

Initializing Grayskull Documentation

During one of the 'Hackdays' at Anaconda, I set up the initial documentation for Grayskull. As the work on Grayskull progresses, we will need proper documentation in place to keep track of the development. The documentation will also help new contributors to onboard with ease. The 'Hackdays' were a great motivation to do things fast!

Writing a Conda Enhancement Proposal

Finally, I wrote a CEP (Conda Enhancement Proposal) explaining why Grayskull is a great recipe generator and how we can make the migration from conda-skeleton to Grayskull possible. The CEP compares features between conda-skeleton and Grayskull and discusses what features need to be added to Grayskull to make it more versatile. The CEP serves as a single point for the community to interact and discuss the proposed changes and make valuable suggestions before a decision is made. You can check out the CEP here.

The Many Things That I Learned

Leadership and Initiative-Taking Skills

When my internship officially began we did not yet have a fixed plan about what I was going to work on. Usually the internship project is decided based on the prior experiences, expertise and interests of the intern. My mentor, Jannis Leidel, encouraged me to explore the various projects ongoing within Conda and see if something interested me. I explored. I was already more than a week into the internship and still couldn't decide what I wanted to work on. This made me anxious. I needed the project to be challenging enough so that it would be help me grow, but I also needed it to not be too difficult, too above my current level of skill and knowledge, because otherwise I might get overwhelmed and drop it midway. I needed it it to be at the sweet spot of being challenging but aligning with my previous experience and knowledge. I suggested that maybe I could work on adding more package origins to Grayskull, because that would take it a step further to being a versatile conda recipe generator. Jannis welcomed my idea and developed on it. Now we needed to decide which new package origin to add to Grayskull. There were several to choose from; PyProject, GitLab, CRAN etc. I reached out to Cheng Lee, who I knew had a lot of experience working on conda packaging. I requested that we set up a meeting to help me decide what to work on. He kindly agreed and after some discussion we decided that it would be a good idea to add CRAN support to Grayskull since R is a popularly-used language and there are a number of R conda packages in the ecosystem. The problem, though, was that I had no prior experience dealing in R packages. Nor did we have any R experts on the team. But coming from conda-forge, I knew the power and resourcefulness of opens source communities. I reached out to people in the conda-forge community who had experience working with R packages. Björn Grüning, who wrote the R helper script for conda-forge (a script that runs over R recipes generated by conda-skeleton and modifies them to better suit conda-forge), was kind enough to talk with me, discuss my ideas for CRAN support in Grayskull and guide me whenever I experienced blockers. I also met with Filipe Fernandes and the developer of Grayskull, Marcelo Trevisani, to discuss with them my plans and ideas. They gave me their valuable insights and advice. Marcelo was also generous enough to agree to meet with me regularly during the term of my internship so that I could receive timely feedback on my progress and help whenever I needed it.

This internship project pushed me to go out of my way to learn and acquire the information I needed to move forward. It forced me to delimit myself, take initiative and develop leadership qualities.

Thinking About and Planning Projects In a Sustainable Manner

I met with my mentor, Jannis Leidel, twice a week. Through our meetings in the duration of three months, I learned many valuable lessons from him. One lesson that I would especially like to mention is 'thinking about projects sustainably'. Jannis would often ask me what I thought would be the future of Grayskull. Was I interested in continuing working on it after the internship? Did I see other people working on it? Jannis insisted that Grayskull development work should continue beyond the internship and that we have to figure out how. He also insisted that it was unfair to expect the original developers (in this case Marcelo) to continue investing time and effort into Grayskull unpaid. We have to figure out ways to promote Grayskull so that its development continues in an organic and sustainable manner. I realized that programs such as Google Summer of Code and Outreachy are very useful for this purpose. They provide visibility to a project and thereby invite new contributors to it. We plan to register Grayskull in such open source programs in the future.

The Advantages of Daily Standups

My manager, Dan Meador, encourages us to write daily standups. A standup is where team members who work asynchronously (because of time zone differences) share with each other what they're currently working on, what they're planning to work on and if they're experiencing any blockers. I realized that writing standups helped me clarify my thoughts and solidified my goals for the day. This is something I really struggled with -- breaking bigger tasks into smaller ones and sorting what needs to be done first. However, through standups and other productivity hacks that my manager shared with me, I was able to learn this skill. I feel that breaking down tasks and deciding every morning (or the previous evening) what exactly you're going to work on today can really enhance your productivity. I still get lazy sometimes, and forget to plan ahead and then my days are not as productive as I'd like them to be, and then I feel super guilty for not being my best productive self. But one's gotta keep trying until good practices become habits.

Connecting With People

What I most enjoy doing in life is connecting with people; recognizing our shared human-ness despite our many apparent differences. And this internship provided me with plenty opportunities to reach out to people, talk to them, discuss ideas, and develop friendships. I am grateful for the many new bonds I made with people within and outside Anaconda.
Jannis says that we must always remember that behind these computer screens there are real human beings, with feelings and egos and insecurities. And as long as we treat each other with empathy, we can successfully create sustainable communities where people feel belonged and cared for. I feel very strongly about this and I believe it is important to always keep in mind the human element during all our professional interactions. At the end of the day we're all fragile, vulnerable beings trying to achieve great goals together with all the strength and grace we can muster.

The last months have been fulfilling and exciting. I have learnt a lot and grown as a software engineer and as a person. I am truly grateful for this opportunity, for the mentorship I received and and for all the new friendships that came my way and made my life more meaningful. Thank you, Anaconda. Thank you, The Spirit of Open Source.

Grayskull - The Community-Developed conda Recipe Generator

Grayskull is a community developed conda recipe generator that does away with a number of problems in conda-skeleton. Improving conda-skeleton is difficult due to its tight coupling with conda-build. Embracing the community developed Grayskull is easier and more sustainable.

Tue 19 July 2022
By Mahe Iram Khan

Read More

Anaconda Engineering Blog

Grayskull - The Community-Developed conda Recipe Generator

By data scientists, for

data scientists

By data scientists, for

data scientists

Anaconda

Social