Packaging and Releasing Private Python Code (Pt.2)

In part one I’ve covered how to define your internal Python packages and their dependencies. Part two will cover how to build wheels and distribute them via wheelhouses.

Why Wheels

Python wheels are a new binary format replacing Python eggs. Unlike eggs, wheels are an installable format, but cannot be executed directly. Installing package from .whl is significantly faster than installing from sdist tarball even for pure Python packages (Django, which is pure Python installs 5 times faster from a wheel, than from source tarball). The difference is enormous or packages with non-Python source code which need to be compiled. For example, NumPy install is 60 times faster from a wheel, compared to building it from the source (and that’s tested on a fairly modern quad-core i7).

Building wheels

Building wheels is fairly straightforward, you can use pip wheel command (which in turns calls python setup.py bdist_wheel). You should specify directory where it’ll output wheel using --wheel-dir (or -w) option. To avoid re-building wheels which you already have, specify the same directory as --find-links (or -f) directory. By default, building wheel will also build wheels for all of its dependencies which don’t have wheels already. You can disable this behavior using --no-deps option.

Private packages command line for building wheels might look like:

 
pip wheel --wheel-dir=~/wheels/ --find-links=~/wheels/ git+ssh://git@github.com/user/package@1.2.3 

This of course assumes a single package per repository. If you have multiple packages in the same repository, you can specify the subdirectory you’re building from as shown below:

 
pip wheel --wheel-dir=~/wheels/ --find-links=~/wheels/ \
git+ssh://git@github.com/user/library@1.2.3#subdirectory=package 

Making universal wheels for Python 2 and 3

If your package supports both Python 2 and 3, you can mark it as such by passing --universal flag to bdist_wheel command. The easiest way to achieve this is the following to the setup.cfg:

  
[bdist_wheel]
universal=1

Resulting wheel will now be marked as py2.py3 and will work with both versions of Python regardless of the version used to build it.

Dealing with Python version specific dependencies

At times you need to specify a requirement which is only needed on a specific platform. Let’s use a real life example: let’s say your package needs Avro (it comes in two flavors, avro and avro-python3). The first is only compatible with Python 2, the latter only with Python 3.
The old-school approach would be to include “if” statements in your setup.py code:

  
if sys.platform.version < 3:
    install_requires.append('avro')
else:
    install_requires.append('avro-python3')

While this will work when installing from source, it won’t work when trying to build a universal wheel. You will only have the requirement for the Python version you’re using to build the wheel, which is not ideal. In theory this is where PEP 508 comes to the rescue. Amongst other goodies, it allows using platform tags directly in the dependencies specifiers, thus this would become:

  
install_requires = [
 'avro; python_version<"3"', 'avro-python3; python_version>="3"',
]

Again, this will work fine when installing from sources. However, this additional information is lost when building a wheel. Within the wheel, it becomes just install_requires = [‘avro’, ‘avro-python3’] and subsequently fails on install. Hopefully this will be resolved in upcoming versions of wheel.

Fortunately, there is a workaround that actually does produce working wheels. It’s hidden in PEP 426 and involves “empty” extras. Unusual as it might seem, this does actually work:

  
extras_require = {
 ':python_version<"3"': ['avro'], ':python_version>="3"': ['avro-python3'],
}

Installing wheels using –find-links index

You can use --find-links with a local directory containing all the wheels, or you can use a URL to an index file containing links to all the wheels. The index file is a simple HTML with links in following format:

  
<a href="https://host/path/package-1.2.3-py2.py3-none-any.whl">
    package-1.2.3-py2.py3-none-any.whl
</a>

Link to the wheel might be relative, therefore dropping all your wheels in a directory on a web server and turning auto-index on will work. If you’re generating the HTML yourself, you can add extra features, for example, a hash.

Generating a simple index

PEP 503 defines simple repository structure. On the top level there are directories, each name being normalized package names. Normalizing means converting all letters to lowercase and replacing all non-alphanumeric characters with “-” (dash). Each directory contains links to wheels (the format is identical as with find-links index in section above). On a web server with auto-indexing, following directory structure would work as a simple index:

 
package/
    package-1.2.2-py2.py3-none-any.whl
    package-1.2.3-py2.py3-none-any.whl
some-other-package/
    Some_Other_Package-4.5.6-py2-none-any.whl
    Some_Other_Package-4.5.7-py2-none-any.whl

As mentioned in the section above, HTML can be generated, to take advantage of additional features or to support layouts other than the example above.

Automating building new wheels

The best practice is to build wheels as soon as you release the software. There are two choices, either you trigger a build as part of the release process, or you can monitor repositories for new releases.

At Eventbrite we’ve implemented polling selected GitHub repositories for new tags which are valid PEP 440 conforming versions. Getting a list of tags is one of few operations that does not require cloning git repository, which makes polling fast. The algorithm is:

  1. For each watched repository execute git ls-remote --tags git@github.com:user/repo.git
  2. From the above filter out results ending with ^{} which are redundant (they’re dereferencing objects)
  3. Get the tag itself, which is the part between refs/tags/ end the end of line
  4. Verify that the tag is a valid version according to PEP 440 (easiest way to do that is by using packaging.version.Version(tag), which will raise InvalidVersion on non-conforming ones)
  5. Check if the version has already been processed before, if it hasn’t trigger a build.
  6. Build wheel by executing pip wheel --wheels-dir /wheels/dir/ --find-links /wheels/dir/ git+ssh://git@github.com/user/repo.git@{tag} (by setting directory containing wheels as --find-links source this will avoid re-building wheels that have been already built)
  7. Repeat step 6. for each kind of target machines (different OS versions or Python version), as some binary wheels are not portable. For example MySQL_python-1.2.5-cp27-cp27mu-linux_x86_64.whl build on Ubuntu 12.04 will not work on Ubuntu 16.04, because of different libmysqlclient.so versions. In case of universal pure Python wheels that might be unnecessary extra work, but even in these cases you might have additional dependencies, like aforementioned MySQL-python library.

Amazon S3 wheelhouse

Amazon S3 works well for distributing wheels (performance can be further improved by including CloudFront on top of that). However, if you look at the readily available solutions for hosting a wheelhouse (a wheel repository) on S3, they assume no access control (S3 bucket’s ACL set to public-read). This is not an option if you want to keep your code private. The standard workaround would be to use s3://{authId}:{authKey}@{bucket}/ style URLs. However, Pip does not understand s3 transport and does not support transport plugins. In case links for Pip, it’s limited to HTTP(S). We have solved that by generating index files with presigned S3 URLs.

 
<a href="https://a-wheelhouse.s3.amazonaws.com/ubuntu/16.04/package-1.0.0-py2-none-any.whl?
AWSAccessKeyId=AKIAxxxx&Expires=1496286239&Signature=xxxx">package-1.0.0-py2-none-any.whl</a>
<a href="https://a-wheelhouse.s3.amazonaws.com/ubuntu/16.04/package-1.1.0-py2-none-any.whl?
AWSAccessKeyId=AKIAxxxx&Expires=1496286239&Signature=xxxx">package-1.1.0-py2-none-any.whl</a>
<a href="https://a-wheelhouse.s3.amazonaws.com/ubuntu/16.04/package-1.2.0-py2-none-any.whl?
AWSAccessKeyId=AKIAxxxx&Expires=1496286239&Signature=xxxx">package-1.2.0-py2-none-any.whl</a>

The additional challenge with the above approach is that these links expire, meaning the index must be regenerated periodically. On the other hand, this makes security management easier, as end-users do not store credentials which would have to be revoked. Also, should any URL leak, it will only be usable for a short time.

The index files themselves, which are just light HTML, are stored on our build servers, which provide necessary access control.

As an aside, signed URLs must match exactly, meaning that even small differences in encoding will cause signature mismatch. For example https://server/pkg-1!2017.1-py2-none-any.whl and https://server/pkg-1%212017.1-py2-none-any.whl are equivalent as URLs, but their signatures will be different.

Using a wheelhouse

To use wheels from the wheelhouse as defined above with pip, use: --find-links (or -f) option. For example:

pip install --find-links=https://wheelhouse/my-index.html private-package==1.2.3  

To avoid having to set --find-links every time, you can either export it as environment variable

 
export PIP_FIND_LINKS=https://wheelhouse/my-index.html 
pip install private-package==1.2.3

… or set it in pip config file

 
[global]
find-links =
   https://wheelhouse/my-index.html 

Alternatively, having a simple index you can pass it to pip using --extra-index option

 
pip install --extra-index=https://wheelhouse/simple/ private-package==1.2.3  

# or 

export PIP_EXTRA_INDEX=https://wheelhouse/simple/
pip install private-package==1.2.

Or set in config file

[global]
extra-index =
   https://wheelhouse/simple/ 

Pip package resolution order

While installing the new package, pip considers matching candidates in the following order:

  1. Already installed
  2. Wheels
  3. Sources

Unless --upgrade (-U) option is used, if already installed package matching requirements is found, it will not be upgraded to the latest version. In case of sources and wheels, a highest matching version will be used, with wheels having priority in case of highest version being found in both.

Above priorities and versions being equal, pip will consider package locations in following priority:

  1. Local filesystem
  2. Indexes (including extra-indexes)
  3. Find-links
  4. Dependency-links

This means that having wheels from the wheelhouse, being at same version, will always take priority over sources from public index. This also means that you can safely add source index pointing to tarballs or github repositories without worrying that it will ever be chosen over already compiled wheel.

Caching with Devpi

Devpi is a caching proxy and server for Python packages, which is fairly easy to set up for mirroring. However, it has two significant limitations: it does not accept flat indexes used by find-links and it can only mirror a hierarchical simple index. Also, it assumes that the mirror URL is a directory and builds URLs by appending package names to it. Thus URL ending with file.html will not work. Neither of these limitations is a problem once you’re aware of it. Having a simple index you can add it to devpi by executing

 
devpi index -c wheelhouse type=mirror mirror_cache_expiry=300 \
mirror_url=https://wheelhouse/simple/

You can then use http://devpi.server:3142/username/wheelhouse/+simple/ as the simple index URL to be used as --extra-index described in the “Using Wheelhouse” section above.

References and further reading

Photo by Raul Cacho Oses on Unsplash

Leave a Reply

Your email address will not be published. Required fields are marked *