In part one I’ve covered how to define your internal Python packages and their dependencies. Part two will cover how to build wheels and distribute them via wheelhouses.
Why Wheels
Python wheels are a new binary format replacing Python eggs. Unlike eggs, wheels are an installable format, but cannot be executed directly. Installing package from .whl is significantly faster than installing from sdist tarball even for pure Python packages (Django, which is pure Python installs 5 times faster from a wheel, than from source tarball). The difference is enormous or packages with non-Python source code which need to be compiled. For example, NumPy install is 60 times faster from a wheel, compared to building it from the source (and that’s tested on a fairly modern quad-core i7).
Building wheels
Building wheels is fairly straightforward, you can use pip wheel
command (which in turns calls python setup.py bdist_wheel
). You should specify directory where it’ll output wheel using --wheel-dir
(or -w
) option. To avoid re-building wheels which you already have, specify the same directory as --find-links
(or -f
) directory. By default, building wheel will also build wheels for all of its dependencies which don’t have wheels already. You can disable this behavior using --no-deps
option.
Private packages command line for building wheels might look like:
pip wheel --wheel-dir=~/wheels/ --find-links=~/wheels/ git+ssh://git@github.com/user/package@1.2.3
This of course assumes a single package per repository. If you have multiple packages in the same repository, you can specify the subdirectory you’re building from as shown below:
pip wheel --wheel-dir=~/wheels/ --find-links=~/wheels/ \ git+ssh://git@github.com/user/library@1.2.3#subdirectory=package
Making universal wheels for Python 2 and 3
If your package supports both Python 2 and 3, you can mark it as such by passing --universal
flag to bdist_wheel
command. The easiest way to achieve this is the following to the setup.cfg
:
[bdist_wheel] universal=1
Resulting wheel will now be marked as py2.py3
and will work with both versions of Python regardless of the version used to build it.
Dealing with Python version specific dependencies
At times you need to specify a requirement which is only needed on a specific platform. Let’s use a real life example: let’s say your package needs Avro (it comes in two flavors, avro
and avro-python3
). The first is only compatible with Python 2, the latter only with Python 3.
The old-school approach would be to include “if” statements in your setup.py
code:
if sys.platform.version < 3: install_requires.append('avro') else: install_requires.append('avro-python3')
While this will work when installing from source, it won’t work when trying to build a universal wheel. You will only have the requirement for the Python version you’re using to build the wheel, which is not ideal. In theory this is where PEP 508 comes to the rescue. Amongst other goodies, it allows using platform tags directly in the dependencies specifiers, thus this would become:
install_requires = [ 'avro; python_version<"3"', 'avro-python3; python_version>="3"', ]
Again, this will work fine when installing from sources. However, this additional information is lost when building a wheel. Within the wheel, it becomes just install_requires = [‘avro’, ‘avro-python3’]
and subsequently fails on install. Hopefully this will be resolved in upcoming versions of wheel.
Fortunately, there is a workaround that actually does produce working wheels. It’s hidden in PEP 426 and involves “empty” extras. Unusual as it might seem, this does actually work:
extras_require = { ':python_version<"3"': ['avro'], ':python_version>="3"': ['avro-python3'], }
Installing wheels using –find-links index
You can use --find-links
with a local directory containing all the wheels, or you can use a URL to an index file containing links to all the wheels. The index file is a simple HTML with links in following format:
<a href="https://host/path/package-1.2.3-py2.py3-none-any.whl"> package-1.2.3-py2.py3-none-any.whl </a>
Link to the wheel might be relative, therefore dropping all your wheels in a directory on a web server and turning auto-index on will work. If you’re generating the HTML yourself, you can add extra features, for example, a hash.
Generating a simple index
PEP 503 defines simple repository structure. On the top level there are directories, each name being normalized package names. Normalizing means converting all letters to lowercase and replacing all non-alphanumeric characters with “-” (dash). Each directory contains links to wheels (the format is identical as with find-links index in section above). On a web server with auto-indexing, following directory structure would work as a simple index:
package/ package-1.2.2-py2.py3-none-any.whl package-1.2.3-py2.py3-none-any.whl some-other-package/ Some_Other_Package-4.5.6-py2-none-any.whl Some_Other_Package-4.5.7-py2-none-any.whl
As mentioned in the section above, HTML can be generated, to take advantage of additional features or to support layouts other than the example above.
Automating building new wheels
The best practice is to build wheels as soon as you release the software. There are two choices, either you trigger a build as part of the release process, or you can monitor repositories for new releases.
At Eventbrite we’ve implemented polling selected GitHub repositories for new tags which are valid PEP 440 conforming versions. Getting a list of tags is one of few operations that does not require cloning git repository, which makes polling fast. The algorithm is:
- For each watched repository execute
git ls-remote --tags git@github.com:user/repo.git
From the above filter out results ending with
^{}
which are redundant (they’re dereferencing objects)- Get the tag itself, which is the part between
refs/tags/
end the end of line - Verify that the tag is a valid version according to PEP 440 (easiest way to do that is by using
packaging.version.Version(tag)
, which will raiseInvalidVersion
on non-conforming ones) - Check if the version has already been processed before, if it hasn’t trigger a build.
- Build wheel by executing
pip wheel --wheels-dir /wheels/dir/ --find-links /wheels/dir/ git+ssh://git@github.com/user/repo.git@{tag}
(by setting directory containing wheels as--find-links
source this will avoid re-building wheels that have been already built) - Repeat step 6. for each kind of target machines (different OS versions or Python version), as some binary wheels are not portable. For example
MySQL_python-1.2.5-cp27-cp27mu-linux_x86_64.whl
build on Ubuntu 12.04 will not work on Ubuntu 16.04, because of differentlibmysqlclient.so
versions. In case of universal pure Python wheels that might be unnecessary extra work, but even in these cases you might have additional dependencies, like aforementioned MySQL-python library.
Amazon S3 wheelhouse
Amazon S3 works well for distributing wheels (performance can be further improved by including CloudFront on top of that). However, if you look at the readily available solutions for hosting a wheelhouse (a wheel repository) on S3, they assume no access control (S3 bucket’s ACL set to public-read
). This is not an option if you want to keep your code private. The standard workaround would be to use s3://{authId}:{authKey}@{bucket}/
style URLs. However, Pip does not understand s3
transport and does not support transport plugins. In case links for Pip, it’s limited to HTTP(S). We have solved that by generating index files with presigned S3 URLs.
<a href="https://a-wheelhouse.s3.amazonaws.com/ubuntu/16.04/package-1.0.0-py2-none-any.whl? AWSAccessKeyId=AKIAxxxx&Expires=1496286239&Signature=xxxx">package-1.0.0-py2-none-any.whl</a> <a href="https://a-wheelhouse.s3.amazonaws.com/ubuntu/16.04/package-1.1.0-py2-none-any.whl? AWSAccessKeyId=AKIAxxxx&Expires=1496286239&Signature=xxxx">package-1.1.0-py2-none-any.whl</a> <a href="https://a-wheelhouse.s3.amazonaws.com/ubuntu/16.04/package-1.2.0-py2-none-any.whl? AWSAccessKeyId=AKIAxxxx&Expires=1496286239&Signature=xxxx">package-1.2.0-py2-none-any.whl</a>
The additional challenge with the above approach is that these links expire, meaning the index must be regenerated periodically. On the other hand, this makes security management easier, as end-users do not store credentials which would have to be revoked. Also, should any URL leak, it will only be usable for a short time.
The index files themselves, which are just light HTML, are stored on our build servers, which provide necessary access control.
As an aside, signed URLs must match exactly, meaning that even small differences in encoding will cause signature mismatch. For example https://server/pkg-1!2017.1-py2-none-any.whl
and https://server/pkg-1%212017.1-py2-none-any.whl
are equivalent as URLs, but their signatures will be different.
Using a wheelhouse
To use wheels from the wheelhouse as defined above with pip, use: --find-links
(or -f
) option. For example:
pip install --find-links=https://wheelhouse/my-index.html private-package==1.2.3
To avoid having to set --find-links
every time, you can either export it as environment variable …
export PIP_FIND_LINKS=https://wheelhouse/my-index.html pip install private-package==1.2.3
… or set it in pip config file
[global] find-links = https://wheelhouse/my-index.html
Alternatively, having a simple index you can pass it to pip using --extra-index
option
pip install --extra-index=https://wheelhouse/simple/ private-package==1.2.3 # or export PIP_EXTRA_INDEX=https://wheelhouse/simple/ pip install private-package==1.2.
Or set in config file
[global] extra-index = https://wheelhouse/simple/
Pip package resolution order
While installing the new package, pip considers matching candidates in the following order:
- Already installed
- Wheels
- Sources
Unless --upgrade
(-U
) option is used, if already installed package matching requirements is found, it will not be upgraded to the latest version. In case of sources and wheels, a highest matching version will be used, with wheels having priority in case of highest version being found in both.
Above priorities and versions being equal, pip will consider package locations in following priority:
- Local filesystem
- Indexes (including extra-indexes)
- Find-links
- Dependency-links
This means that having wheels from the wheelhouse, being at same version, will always take priority over sources from public index. This also means that you can safely add source index pointing to tarballs or github repositories without worrying that it will ever be chosen over already compiled wheel.
Caching with Devpi
Devpi is a caching proxy and server for Python packages, which is fairly easy to set up for mirroring. However, it has two significant limitations: it does not accept flat indexes used by find-links and it can only mirror a hierarchical simple index. Also, it assumes that the mirror URL is a directory and builds URLs by appending package names to it. Thus URL ending with file.html
will not work. Neither of these limitations is a problem once you’re aware of it. Having a simple index you can add it to devpi by executing
devpi index -c wheelhouse type=mirror mirror_cache_expiry=300 \ mirror_url=https://wheelhouse/simple/
You can then use http://devpi.server:3142/username/wheelhouse/+simple/
as the simple index URL to be used as --extra-index
described in the “Using Wheelhouse” section above.
References and further reading
- Python Packaging – https://python-packaging.readthedocs.io/
- PEP 426 — Metadata for Python Software Packages 2.0 – https://www.python.org/dev/peps/pep-0426/
- PEP 440 — Version Identification and Dependency Specification – https://www.python.org/dev/peps/pep-0440/
- PEP 503 — Simple Repository API – https://www.python.org/dev/peps/pep-0503/
- PEP 508 — Dependency specification for Python Software Packages – https://www.python.org/dev/peps/pep-0508/
Photo by Raul Cacho Oses on Unsplash