SBOMs at Anaconda

Last fall, Anaconda launched a collaboration with Microsoft to create Software Bills of Material (SBOMs) for all packages in the “defaults” channels of our repository. We are excited to announce that we have achieved this goal and to share why and how we did it.

What are SBOMs and what value do they provide?

Following the discovery of the SolarWinds supply chain hack in 2021, the White House issued the Executive Order on Improving the Nation’s Cybersecurity, which detailed new requirements to help strengthen the security of the Federal Government’s software supply chain. SBOMs are a key part of this effort, functioning as “a list of ingredients” that enable users to verify the components, licensing, and provenance of software installed on their systems.

Anaconda’s SBOMs are built in accordance with Software Package Data Exchange (SPDX) specifications, version 2.2.1, which specifies the checksum hash values of software down to the individual file level. When a new software vulnerability is discovered and made public (e.g. via the NVD database), we can check whether our packages contain any vulnerable components, identify their hash values in our SBOMs, and use those values to verify whether these vulnerable components are installed on our users’ systems. The licensing and provenance information can also help our users (particularly enterprise customers) determine whether certain packages meet their governance standards. Finally, we cryptographically sign each SBOM document so that they can be verified by the recipients, ruling out tampering.

Building sbomtool

Anaconda has about 300,000 package artifacts (.tar.bz2 and .conda files) in its “defaults” channels, and we needed to build a tool that could create SBOMs for all of them. This is sbomtool. On a basic level, sbomtool is a CLI application written in Python that ingests conda packages and outputs SBOM documents that follow the SPDX specification. Early on, we discovered a great Python package (built and maintained by the SPDX organization) that provided us with an easy-to-use API to build and validate SBOMs.

However, before we could start churning out SBOMs, there were a number of challenges we had to address. For example, we knew that license information hasn’t always been consistently available in our packages. SPDX maintains a standard for formatting common open source software license types, and we incorporated a mechanism to apply this standard in our SBOMs and backfill corrections for packages that have incorrect or missing license information. We researched and made such corrections for over 1,800 packages so that their SBOMs comply with the SPDX license type standard.

There were other challenges we did not anticipate that led us down some interesting and productive paths:

  • The architecture of a conda package can normally be found under the subdir key in the index.json metadata file. But after reviewing a particularly big batch of SBOM failures related to this key, we discovered that in much older conda packages the architecture was listed under a platform key instead. A bit of code archaeology revealed that the subdir key was added in 2015 and was implemented to enable use of the first “noarch” builds of conda packages.

  • When we started using tools-python to build sbomtool, the only checksum algorithm it supported was SHA1, which is no longer accepted as secure. We wrote a patch and contributed a PR to the upstream project to enable use of more secure checksum algorithms like SHA256 (which we use in our SBOMs).

  • Some conda packages contain symlinks to system files on host environments or other files within the package itself. SPDX did not have a prescribed method for specifying symlinks when we began work on sbomtool, so we raised an issue and started a discussion that may soon lead to a policy update.

Deployment

On March 24, 2022 - after months of building sbomtool, fixing bugs, and resolving metadata issues - we reached 100% SBOM coverage and are continuing to build SBOMs for new packages that are uploaded daily. We are now focusing on automating and integrating sbomtool into our package build pipeline, and we look forward to making SBOMs available as a new tiered service offering for our customers.