On packaging

Published December 14, 2008. Filed under: Python.

So currently there’s a bit of a to-do involving Debian’s Ruby packaging team and the Ruby “gem” system. This document does a good job of summarizing the issues from Debian’s perspective. And of course, the Ruby side of it is no less heated; an example is here.

A lot of this is the usual back-and-forth between (on the one side) application developers working in one particular language, who want to distribute their applications to the widest possible audience and so use an operating-system-agnostic but language-specific tool for doing so, and (on the other side) operating-system maintainers who want, as much as possible, to deal with one standardized, language-agnostic but platform-specific tool to distribute software and updates. And ordinarily those goals can get along fairly well as long as a few compromises are made, but in this specific case the elevated tension seems to be caused by the way the Ruby gem system works, specifically by tightly coupling application code to the use of the gem utility. Pretty much everything else on the Debian packagers’ list of problems seems like it could be resolved if this issue went away.

But I’m not here to try to solve that problem; I’m simply mentioning it because it’s an interesting parallel (and hence a good lead-in) to some long-standing complaints I’ve had about the way packaging is often done in Python, and which recently came to my attention once again when someone filed a bug report against django-registration, mentioning that a custom management command in that application doesn’t work if you install via easy_install with the default options.

If you’d like to just get the executive summary, here it is: Please, for the love of Guido, stop using setuptools and easy_install, and use distutils and pip instead. If you’d like to know why, read on.

Also, please note that the following are simply my opinions; I have some experience to back them up, from both personal projects and my duties as Django’s release manager (and, hence, the person who makes the packages for Django), but my opinions are simply mine, and not those of any particular project or institution (for the record: Django doesn’t use setuptools anymore, but I wasn’t part of the decision to move away from it).

Why non-standard packaging tools exist for Python

Most of my problems with setuptools boil down to the same problem that seems to be at the heart of the Debian-vs.-Ruby fight: setuptools has an unfortunate habit of infecting bits of code which shouldn’t need to have any awareness of how the code, or its dependencies, are being packaged and distributed. As a starting point, consider how Python’s standard distribution system — distutils — works:

  1. You write your code.
  2. You write a script named setup.py which imports the setup function from distutils and specifies the packaging options you want.
  3. You run setup.py sdist to generate a standard source package, or use other commands to build different package formats or upload the package to the Python package index.

Installing something that’s been packaged with distutils is easy; if it’s in a format specific to your operating system (distutils can generate a variety of OS-specific formats, including for example RPM packages for Red Hat systems or self-extracting installers for Windows) then you can simply install normally. If you’ve got a source package, however, it’ll simply be a standard compressed archive (.tar.gz format) you can unpack to get the code and the setup.py script, and setup.py install will install it.

And of course the installation process is configurable in a variety of ways, as covered in the distutils documentation. So far, so good.

But there are two major shortcomings to distutils:

  1. It provides no way to specify dependencies between packages.
  2. It provides no way to emulate the experience on, for example, many Linux distributions where you type a command, feed it a package name, and the appropriate package (and dependencies) will be downloaded from a repository and installed for you.

To provide this functionality, many people turn to setuptools. And if it had done nothing except deal with these two issues, it would have been great; an easy dependency-management mechanism and a network-enabled installation system make lots of people’s lives easier. But setuptools didn’t stop there, and that’s where the real trouble begins.

Let Python be Python

Setuptools goes far beyond merely providing dependency management and an easy network-enabled install tool; it also adds a number of other features, and does quite a lot of work to support them. Most complaints boil down to the fact that these features, and their associated support:

  1. Cause Python to stop behaving the way Python is documented as behaving, and
  2. Create a bizarre parallel world of things which are only accessible from, and which only work when using, setuptools and its APIs.

The first of these is certainly bad, but the second is the one which really bothers me, and which closely parallels the problems with Ruby’s gem system.

To see an example, consider a Python feature that’s occasionally useful: if you have a zip file whose contents are files of Python code, you can place the zip file on your Python import path and import will just work for the code inside it; Python knows how to look inside a zip file and find the code, and you don’t need to do anything else special aside from making sure the file’s on the import path.

But setuptools has latched onto this feature to create an entire zipped package format which includes not only Python code but also things like data files. Now, normally if you package an application which includes some data files, you can specify that they’re to be installed alongside the code and use standard Python techniques to figure out where your package is and where your data files are, and work with them from your code. With setuptools, however, you can’t do this, because setuptools puts your data files into the same zipped package, and from that point on you have to use functions in setuptools to access them.

Oh, and did I mention that this is how setuptools does things by default? Anything you install via easy_install will get this treatment unless:

  1. You’ve explicitly told easy_install not to do this on a per-package basis, or
  2. You’ve explicitly configured setuptools to disable this “feature” globally, or
  3. The person who created the package set it up to force setuptools not to zip it.

Requiring one of the first two options to make Python applications work normally is bad enough, but the third is simply perverse: in order to create a package that setuptools won’t try to zip, you have to use setuptools to create the package. Which, in turn, means that only people who have setuptools installed can install your package. “You can opt out of our system by opting in to our system” is not an acceptable way to do things, in my opinion.

This is, incidentally, how setuptools managed to break django-registration. The fact that setuptools defaults to installing that zipped version of the package means Django’s standard mechanism for locating management commands stops working; since Django doesn’t use setuptools’ APIs to peer into zipped packages, it can’t see the custom management command bundled in django-registration. And, of course, most people who use easy_install don’t actually know that it behaves this way, since they just wanted, well, an easy way to install Python packages. So the bug reports end up coming to me, which makes me sad and angry.

And that’s really just the tip of the iceberg; setuptools and its associated frameworks, because of the features they try to support, end up slipping out of packaging concerns and into your application code in all sorts of oddball ways. There’s even an analog of the require_gem feature which gives Debian packagers headaches: setuptools lets you specify dependencies directly in application code to ensure that you’re importing precisely the version of a library that you want to import, and this only works when setuptools is also installed (and, from what I can tell, may only work if the package you want to import from was itself installed by setuptools).

The end result is that, once you start using setuptools, you’re gradually nudged further and further away from using standard Python APIs and techniques, and more and more into using things that only exist as part of setuptools. And just when you thought it couldn’t get worse, setuptools also encourages package creators to set up drive-by installations of setuptools, so that unsuspecting users end up with it installed whether they wanted it or not.

Me, I’m a fan of Python being Python, not some bizarre parasitic thing that tries to force itself on you and make you use its own APIs instead of Python’s. So I generally stay as far away from setuptools as I possibly can.

The alternative

Of course, this brings up a question: if setuptools and easy_install are bad, what can we use instead? For packaging, I still use (and in fact have always used) just plain old distutils. It’s simple (at least, it’s about as simple as a packaging system can be while still being useful), it’s standard, it works, I use it for all my personal applications.

For actually installing and managing packages, I use pip. It’s by Ian Bicking, who’s smarter than any ten people have a right to be, and it gets an awful lot of things right. One of those things, and the one which, by itself, would make pip worthwhile, is actually noted as a shortcoming in its documentation:

It cannot install from eggs. It only installs from source.

Eggs, of course, are setuptools’ zipped package format. I’m really really OK with not installing from eggs, Ian. If I type:

easy_install django-registration

then even though it’s a standard source-code package built with distutils I still end up with a broken zipped package that can’t find its own management command. But if I type:

pip install django-registration

then I get something that actually works the way Python is supposed to work. I’m really OK with that.

Another thing pip gets right is that it doesn’t try to graft a dependency-specification system onto the setup.py script, and so doesn’t create a dependency from your setup.py script to pip. Instead, it lets you write a short file listing your requirements and point pip at that file; it’ll handle the rest.

And as if that wasn’t enough, pip — thanks to its requirement-file mechanism and a couple other features — enables the holy grail of deployment: the repeatable, scriptable install. Seriously, pip is good stuff.

So my recommendation is that you run, not walk, over to pip, then forget about setuptools and easy_install. While you’re at it, check out virtualenv (also by Ian), which makes all sorts of previously-huge deployment and management headaches go away, and about which I plan to write much more in the future.