Why I like pip

Published December 15, 2008. Filed under: Python.

So yesterday I explained some of the reasons why I don’t like setuptools. In essence, my objections boil down to one idea: application packaging and application development should be orthogonal concerns. The way setuptools works, however, seems to tend, inevitably, toward coupling them to each other. I gave one example — the way the default behavior of installing zipped packages (an ironic twist: the man who so eloquently explained how Python is not Java has spent so much time and effort trying to implement Java packaging conventions in Python) leads to a need to use specialized, non-standard, packaging-system-specific APIs to access things like data files — and mentioned that, unfortunately, this is only one example of how use of setuptools ends up changing application code to deal with packaging concerns.

By way of a lead-in, I mentioned that Debian and various Ruby folks are at each other’s throats right now over a very similar problem: the de facto Ruby packaging system (gem) does some very similar things, causing headaches for people who (like Debian) want to package and distribute Ruby code without relying on the gem system (since they already have a package-management framework which handles much more than just Ruby). This debate should stand as a clear example of a real-world problem created by the coupling of packaging systems and application code, and so should make it quite clear that this is an issue with setuptools.

But toward the end of yesterday’s article I suggested pip as an alternative to the setuptools/pkg_resources/easy_install toolchain, and today I’d like to explain a bit more about why I prefer pip and some of the concrete benefits it offers.

pip installs packages

Building a bit on what I wrote yesterday, one of the main attractions of pip is the fact that it’s just an installation tool. It doesn’t really care whether you built your package with distutils or with setuptools or, in some cases, whether it’s even a package at all (since it can install from the URL of a Subversion repository if you ask it to). That’s a really big deal, because it means that pip does not change your packaging workflow in any way. You just make your package the way you’ve always made it, and then put it up on the Web somewhere (preferably listing it in the Python package index, but you don’t have to if you don’t want to) and people using pip can grab it and install it.

Contrast this with the setuptools way of doing things, which requires you to change your packaging workflow and introduce a dependency on setuptools, and which can still cause headaches if you aren’t using it to package your code but someone else is using it to install your code, requiring you to work around setuptools even when you aren’t actually using it.

Plus, if you do build your packages with setuptools, pip can read the dependencies and track them down for you; originally, I’d assumed this functionality came from PoachEggs, which seemed like it replicated a lot of setuptools’ parsing of requirements, but in his response to my article yesterday Ian clarified that pip just uses the setuptools APIs to do this. Which is saying something: glossing over setuptools’ warts to the point where you don’t even realize it’s being used is a pretty big deal.

Another practical issue with the setuptools way of life is that easy_install can easily create broken installations; it doesn’t do a whole lot of up-front checking to make sure it’ll actually be able to install both the requested package and the dependencies, and this can lead to problems if something in the dependency chain ends up uninstallable. Meanwhile, pip looks before it leaps, can bail out early if it’s not going to be able to install your package and will leave behind a useful log file explaining what went wrong.

This is good stuff. But it gets better.

Reproducible builds

The point where pip really shines, though, is in the ease of specifying and creating reproducible builds. If you’ve ever dealt with having to deploy the same code base across multiple machines, you know what a headache this can be, since a huge number of factors (operating system and version, pre-installed packages and versions, system package managers and configuration, etc., etc.) can change the results of your deployment process, sometimes in subtle and difficult-to-debug ways. With pip, this is not (so much of) a problem.

I mentioned pip requirements files yesterday as an alternative to the way setuptools specifies dependencies directly in setup.py, and that’s certainly one useful application of the feature, but you can take it much further: once you know which packages (and, just as important, which versions of which packages) you need, you can write them down in a simple, plain-text file, point pip at it, and it’ll install them.

Once you have reproducible builds of Python packages, a whole world of useful techniques opens up: you can just distribute a basic requirements file specifying your application and its dependencies, or you can distribute multiple files specifying various optional configurations, or… well, pretty much anything you like.

This is somewhat similar to what you can do with zc.buildout, but with a few of major differences:

  1. pip only handles Python packages, while buildout can handle pretty much anything you want to throw at it.
  2. pip‘s format for specifying what to install is, in my experience, quite a lot simpler for the common case (“here’s a list of packages I want”); buildout‘s recipe system is a bit more complex up-front, but this does mean more complex setups are a bit easier to manage.
  3. buildout really really likes eggs. This is the main reason why I haven’t done more with it, but if you really like eggs you might want to give it a spin.

The “freezing” feature of pip also adds to the ease-of-use; given an already-working environment with all the packages you need, pip (via the command pip freeze) can spit out a requirements file for you (which you can then edit, of course, to remove any unnecessary entries) and you can use that as the basis for replicating your working build.

Better development and deployment

The last piece of the puzzle, for me, is virtualenv; virtualenv is a tool for creating and working with isolated Python environments, and is basically the only way I work with Python these days.

On the development side, virtualenv makes it easy to try out Python software without screwing up anything you already have installed, or do parallel development of multiple versions of some piece of code. On the deployment side, virtualenv solves a major problem setuptools has tried to work around with some of its own features: how to handle a situation where different applications require different (and possibly conflicting) versions of the same library. The solution virtualenv supplies is an easy, lightweight and above all isolated environment for installing and using Python software, so that two applications with conflicting requirements can simply run in two different virtualenv environments.

And, naturally, pip integrates quite nicely with virtualenv; normally, when working in an active virtualenv, Python packaging/installation tools (pip included) will install into that virtualenv, but pip also lets you:

  1. Specify a virtualenv to install into (using the -E flag), and
  2. Create a new virtualenv and install into it.

The second one is really the killer feature, because it means you can set up a requirements file specifying a list of packages, and get pip to create a virtualenv for you and install the packages into it. To see how handy this can be, let’s take a simple example: suppose you have an application which runs fine against the current Django release (1.0.2), and you want to play around a bit with the Django development trunk to try out a new feature. You could create a requirements file specifying docutils (so the automatic Django documentation will work) and the Subversion URL for Django trunk:

docutils==0.5
-e svn+http://code.djangoproject.com/svn/django/trunk#egg=Django

Save this into a file named, say, django-requirements.txt. Then (assuming you have pip and virtualenv installed), run the following:

pip install -E django-trunk -r django-requirements.txt
source django-trunk/bin/activate

This will create a virtualenv named “django-trunk”, install docutils and an SVN checkout of Django trunk, then activate it so you can begin working with it immediately (to exit the virtualenv, simply close your shell or type deactivate). And, of course, you can also drop a copy of your application in there to hack on it and try out trunk-only Django features. This is pretty handy for local development, but imagine how much easier it could also make deployment: simply have pip and virtualenv preinstalled on your server, then upload a requirements file and let pip do its thing. You can even point mod_wsgi at a virtualenv to have it use the isolated environment for your applications.

The tip of the iceberg

And… well, there’s a heck of a lot more I could write here about pip (and about virtualenv, and about some other interesting tools), but I think this is a good start and hopefully I’ve at least got you interested enough to explore a bit on your own. And I hope I’ve managed to communicate some of the practical reasons why I’ve ditched easy_install for package installation; compared to what pip can do right now (not even considering what it might be able to do in the future), easy_install just doesn’t measure up enough to justify the headaches it can create.