On packaging

An entry published by James Bennett on December 14, 2008, Part of the category Python. 32 comments posted.

So currently there’s a bit of a to-do involving Debian’s Ruby packaging team and the Ruby “gem” system. This document does a good job of summarizing the issues from Debian’s perspective. And of course, the Ruby side of it is no less heated; an example is here.

A lot of this is the usual back-and-forth between (on the one side) application developers working in one particular language, who want to distribute their applications to the widest possible audience and so use an operating-system-agnostic but language-specific tool for doing so, and (on the other side) operating-system maintainers who want, as much as possible, to deal with one standardized, language-agnostic but platform-specific tool to distribute software and updates. And ordinarily those goals can get along fairly well as long as a few compromises are made, but in this specific case the elevated tension seems to be caused by the way the Ruby gem system works, specifically by tightly coupling application code to the use of the gem utility. Pretty much everything else on the Debian packagers’ list of problems seems like it could be resolved if this issue went away.

But I’m not here to try to solve that problem; I’m simply mentioning it because it’s an interesting parallel (and hence a good lead-in) to some long-standing complaints I’ve had about the way packaging is often done in Python, and which recently came to my attention once again when someone filed a bug report against django-registration, mentioning that a custom management command in that application doesn’t work if you install via easy_install with the default options.

If you’d like to just get the executive summary, here it is: Please, for the love of Guido, stop using setuptools and easy_install, and use distutils and pip instead. If you’d like to know why, read on.

Also, please note that the following are simply my opinions; I have some experience to back them up, from both personal projects and my duties as Django’s release manager (and, hence, the person who makes the packages for Django), but my opinions are simply mine, and not those of any particular project or institution (for the record: Django doesn’t use setuptools anymore, but I wasn’t part of the decision to move away from it).

Why non-standard packaging tools exist for Python

Most of my problems with setuptools boil down to the same problem that seems to be at the heart of the Debian-vs.-Ruby fight: setuptools has an unfortunate habit of infecting bits of code which shouldn’t need to have any awareness of how the code, or its dependencies, are being packaged and distributed. As a starting point, consider how Python’s standard distribution system — distutils — works:

  1. You write your code.
  2. You write a script named setup.py which imports the setup function from distutils and specifies the packaging options you want.
  3. You run setup.py sdist to generate a standard source package, or use other commands to build different package formats or upload the package to the Python package index.

Installing something that’s been packaged with distutils is easy; if it’s in a format specific to your operating system (distutils can generate a variety of OS-specific formats, including for example RPM packages for Red Hat systems or self-extracting installers for Windows) then you can simply install normally. If you’ve got a source package, however, it’ll simply be a standard compressed archive (.tar.gz format) you can unpack to get the code and the setup.py script, and setup.py install will install it.

And of course the installation process is configurable in a variety of ways, as covered in the distutils documentation. So far, so good.

But there are two major shortcomings to distutils:

  1. It provides no way to specify dependencies between packages.
  2. It provides no way to emulate the experience on, for example, many Linux distributions where you type a command, feed it a package name, and the appropriate package (and dependencies) will be downloaded from a repository and installed for you.

To provide this functionality, many people turn to setuptools. And if it had done nothing except deal with these two issues, it would have been great; an easy dependency-management mechanism and a network-enabled installation system make lots of people’s lives easier. But setuptools didn’t stop there, and that’s where the real trouble begins.

Let Python be Python

Setuptools goes far beyond merely providing dependency management and an easy network-enabled install tool; it also adds a number of other features, and does quite a lot of work to support them. Most complaints boil down to the fact that these features, and their associated support:

  1. Cause Python to stop behaving the way Python is documented as behaving, and
  2. Create a bizarre parallel world of things which are only accessible from, and which only work when using, setuptools and its APIs.

The first of these is certainly bad, but the second is the one which really bothers me, and which closely parallels the problems with Ruby’s gem system.

To see an example, consider a Python feature that’s occasionally useful: if you have a zip file whose contents are files of Python code, you can place the zip file on your Python import path and import will just work for the code inside it; Python knows how to look inside a zip file and find the code, and you don’t need to do anything else special aside from making sure the file’s on the import path.

But setuptools has latched onto this feature to create an entire zipped package format which includes not only Python code but also things like data files. Now, normally if you package an application which includes some data files, you can specify that they’re to be installed alongside the code and use standard Python techniques to figure out where your package is and where your data files are, and work with them from your code. With setuptools, however, you can’t do this, because setuptools puts your data files into the same zipped package, and from that point on you have to use functions in setuptools to access them.

Oh, and did I mention that this is how setuptools does things by default? Anything you install via easy_install will get this treatment unless:

  1. You’ve explicitly told easy_install not to do this on a per-package basis, or
  2. You’ve explicitly configured setuptools to disable this “feature” globally, or
  3. The person who created the package set it up to force setuptools not to zip it.

Requiring one of the first two options to make Python applications work normally is bad enough, but the third is simply perverse: in order to create a package that setuptools won’t try to zip, you have to use setuptools to create the package. Which, in turn, means that only people who have setuptools installed can install your package. “You can opt out of our system by opting in to our system” is not an acceptable way to do things, in my opinion.

This is, incidentally, how setuptools managed to break django-registration. The fact that setuptools defaults to installing that zipped version of the package means Django’s standard mechanism for locating management commands stops working; since Django doesn’t use setuptools’ APIs to peer into zipped packages, it can’t see the custom management command bundled in django-registration. And, of course, most people who use easy_install don’t actually know that it behaves this way, since they just wanted, well, an easy way to install Python packages. So the bug reports end up coming to me, which makes me sad and angry.

And that’s really just the tip of the iceberg; setuptools and its associated frameworks, because of the features they try to support, end up slipping out of packaging concerns and into your application code in all sorts of oddball ways. There’s even an analog of the require_gem feature which gives Debian packagers headaches: setuptools lets you specify dependencies directly in application code to ensure that you’re importing precisely the version of a library that you want to import, and this only works when setuptools is also installed (and, from what I can tell, may only work if the package you want to import from was itself installed by setuptools).

The end result is that, once you start using setuptools, you’re gradually nudged further and further away from using standard Python APIs and techniques, and more and more into using things that only exist as part of setuptools. And just when you thought it couldn’t get worse, setuptools also encourages package creators to set up drive-by installations of setuptools, so that unsuspecting users end up with it installed whether they wanted it or not.

Me, I’m a fan of Python being Python, not some bizarre parasitic thing that tries to force itself on you and make you use its own APIs instead of Python’s. So I generally stay as far away from setuptools as I possibly can.

The alternative

Of course, this brings up a question: if setuptools and easy_install are bad, what can we use instead? For packaging, I still use (and in fact have always used) just plain old distutils. It’s simple (at least, it’s about as simple as a packaging system can be while still being useful), it’s standard, it works, I use it for all my personal applications.

For actually installing and managing packages, I use pip. It’s by Ian Bicking, who’s smarter than any ten people have a right to be, and it gets an awful lot of things right. One of those things, and the one which, by itself, would make pip worthwhile, is actually noted as a shortcoming in its documentation:

It cannot install from eggs. It only installs from source.

Eggs, of course, are setuptools’ zipped package format. I’m really really OK with not installing from eggs, Ian. If I type:

easy_install django-registration

then even though it’s a standard source-code package built with distutils I still end up with a broken zipped package that can’t find its own management command. But if I type:

pip install django-registration

then I get something that actually works the way Python is supposed to work. I’m really OK with that.

Another thing pip gets right is that it doesn’t try to graft a dependency-specification system onto the setup.py script, and so doesn’t create a dependency from your setup.py script to pip. Instead, it lets you write a short file listing your requirements and point pip at that file; it’ll handle the rest.

And as if that wasn’t enough, pip — thanks to its requirement-file mechanism and a couple other features — enables the holy grail of deployment: the repeatable, scriptable install. Seriously, pip is good stuff.

So my recommendation is that you run, not walk, over to pip, then forget about setuptools and easy_install. While you’re at it, check out virtualenv (also by Ian), which makes all sorts of previously-huge deployment and management headaches go away, and about which I plan to write much more in the future.

On December 14, 2008, jodal said:

The Debian link is broken.

On December 14, 2008, Jesse Noller said:

Fantastic! I’ve always had a dark spot in my heart of the .egg style distribution.

On December 14, 2008, jurev said:

Also the command easy_install doesn’t say anything about it being python-specific. Not very polite.

On December 14, 2008, Paul Bonser said:

So what we need is something like pip to be included in the Python standard library so people don’t go looking for other solutions.

It seems like such an install system should probably be included in the standard library of any language that wants to help people avoid being horribly frustrated with figuring out how to install things.

On a side not, I suppose it would be horribly wrong if I installed pip by running “easy_install pip”, wouldn’t it?

On December 14, 2008, bob said:

+1

On December 14, 2008, Andy said:

Thanks for the detailed explanation. setuptools has always smelled funny to me and it’s nice to see someone lay out the specific reasons why.

On December 14, 2008, Noman said:

operating-system maintainers who want, as much as possible, to deal with one standardized, language-agnostic but platform-specific tool to distribute software and updates”

That sounds like it’s an issue with dpkg/apt, but that’s not the case. The Debian folks are pretty clear on this point: “Rubygems packages are not compatible with the FHS”. FHS isn’t Debian-specific, or even Linux-specific. They even say “We do not oppose Ruby having a packaging system”.

I think the more open a system is, the more it needs to rely on open standards. Debian is completely free, and so sometimes come across as rather anal about adopting open standards where they exist. In a fight between “open standards we’re already using” and “a language (in which no crucial system software is written)”, the former is going to win.

I think the moral of the story is: if you really “want to distribute [your] applications to the widest possible audience”, don’t make your packaging system incompatible with an open standard that one of the top Linux distros is using. :-)

On December 14, 2008, Bill Mill said:

I used setuptools to download and install pip before blowing away setuptools… does that make me a bad person? :)

On December 14, 2008, Will Liu said:

Thanks for this post. I’ve also been struggling for a better solution wrt to this issue.

What’s your opinion on making it so that when you do use easy_install, it installs into something like usr/local (and hence try to keep some semblance of sanity)? In any case, this seems like a good alternative and something I will definitely try.

@wliu

On December 14, 2008, ryan said:

When standard distutils allows for separately developed and released packages under the same namespace, then I will happily drop setuptools. But until then, flawed as it may be, setuptools is - in my mind - necessary.

The django team has made the decision to develop the whole of django in (what effectively amounts to) a single package, so the main problem I have with distutils does not apply to you.

Nonetheless, whichever approach is “correct,” if there is to be a “one true tool,” there are large enough communities using these different approaches that that tool must enable both methods. Distutils (as far as I can figure out) does not.

On December 14, 2008, Kazam said:

If a program requires another program called easy_instal to install it, it is by definition NOT easy to install!

On December 14, 2008, David said:

Amen!

At work we are stuck using an OLD CRUSTY linux distro (maybe 500 workstations), no one has sudo, and eggs are the bane of our existence. An RPM would be more welcome, if you can believe it.

On December 14, 2008, Cory said:

Great post - an eye-opener. I never thought about the “opt out by opting in” aspect. I have always hated zipped eggs; even if nobody used the setuptools resource APIs, the zipfiles make debugging inside third-party software tedious.

However, I consider it a bit of a non-issue; as a package maintainer, you should opt in to setuptools, even if you hate it and want pip to “win”, because your true goal is to get your software into the maximum number of hands, right?

As a result of your post, I’m going to research pip and try to pip-enable the software I’m about to launch (Hypy). This software will also have a PPA on launchpad, so I’m hitting three package systems at once, hooray!

BTW, I’m #3 on the list of people who decided to easy_install pip. I notice that it doesn’t use the zipped format when I do that. :-)

On December 14, 2008, Florian said:

These are the reasons why you should use setuptools:

It’s preposterous to claim setuptools is some kind of parasitic leech. It’s a fabulous tool to help solve some of the most broken and incomplete things still unsolved by anybody else including pip and distutils.

But you know this, it’s not like you have no clue. So before you go blabbering about nonsense like stopping to use a tool that is really useful, and how much you hate that it messes with your (limited) mindset, how about you provide one that is better?

On December 14, 2008, Henrik Joreteg said:

As a relatively new-to-both-python-and-django developer. I assure you that the reason for using easy_install is simply ignorance.

I’ve been using it, but didn’t realize there was any problem with what it did or how it worked.

Us newbies simply need to know what to use. I second the opinion that it’d be nice to have an “official” solution that was part of Python.

Till then, thanks for the tip.

On December 14, 2008, John Millikin said:

Speaking as a regular and contented user of Setuptools, who also writes packages for Django, I have a few issues with your post.

First, you’re actually complaining about two distinct pieces of software: setuptools and pkg_resources. setuptools is a system for packaging and installing Python software on any supported OS. pkg_resources provides automatic plugin and data file search support. You may freely use setuptools to package software that doesn’t use any of pkg_resources‘s features, or use pkg_resources from within software packaged by distutils.

Now, normally if you package an application which includes some data files, you can specify that they’re to be installed alongside the code and use standard Python techniques to figure out where your package is and where your data files are, and work with them from your code. With setuptools, however, you can’t do this, because setuptools puts your data files into the same zipped package, and from that point on you have to use functions in setuptools to access them.

What “standard” techniques are those? The only method I know of to find packaged data files without pkg_resources is horrible: hard-coding the relationship between code and data files using the __file__ variable. In contrast, pkg_resources lets the data files be moved to another directory, stored in a compressed archive, dynamically loaded, etc. It’s a system by far superior to __file__ hacks.

This is, incidentally, how setuptools managed to break django-registration. The fact that setuptools defaults to installing that zipped version of the package means Django’s standard mechanism for locating management commands stops working; since Django doesn’t use setuptools’ APIs to peer into zipped packages, it can’t see the custom management command bundled in django-registration.

The reason Django can’t discover commands in compressed packages is that Django’s method for discovering new commands is pants. It makes several unwarranted assumptions about the layout of third-party packages and relies on module namespace magic.

And, of course, most people who use easy_install don’t actually know that it behaves this way, since they just wanted, well, an easy way to install Python packages. So the bug reports end up coming to me, which makes me sad and angry.

File a bug with Django, and forward bugs reported to you regarding this issue to it.

On December 14, 2008, John Millikin said:

(part 2, due to 3000 character limit)

There’s even an analog of the require_gem feature which gives Debian packagers headaches: setuptools lets you specify dependencies directly in application code to ensure that you’re importing precisely the version of a library that you want to import, and this only works when setuptools is also installed (and, from what I can tell, may only work if the package you want to import from was itself installed by setuptools).

How is this the fault of setuptools? If a user’s code had lines such as this:

import django
assert django.VERSION == (1, 0, ‘final’)

you’d call it the user’s fault, and rightly so. setuptools provides functions for automatic version checks, but it can’t verify that such checks are correct in each instance.

For packaging, I still use (and in fact have always used) just plain old distutils. It’s simple (at least, it’s about as simple as a packaging system can be while still being useful), it’s standard, it works, I use it for all my personal applications.

Distutils is fine if all you need to do is create a package; but then, so is tar. It doesn’t support any of setuptool’s advanced functionality, such as verifying prerequisites before installation or registering plugins.

For actually installing and managing packages, I use pip.

From the link:

It cannot install from eggs. It only installs from source. Maybe it doesn’t work on Windows. At least, the author doesn’t test on Windows often.

These two means it’s effectively unusable in Windows. As much as I love using Linux, there are times when I need to install Python packages in Windows also. Frankly, if you exclude Windows, why would you need pip at all? Just use apt-get, ports, or whatever the packaging system of choice is for your UNIX system.

On December 14, 2008, Travis Jeffery said:

I like setuptools and easy_install they seem to work well enough, you just need to know what you’re looking for. If I’m not mistaken setuptools uses distutils?

And if you’re a developer setuptools brings ridiculously little overhead for developers to code something up and push it out.

On December 14, 2008, Baczek said:

IIRC eggs and setuptools got a, let’s call it, heated discussion on python-dev back when I was following it, for exactly the reasons you point out.

On December 14, 2008, Florian said:

There’s no “standard” way to do things around python and packages, plugins, entry points, versions, dependencies and data in packages.

Distutils alone is insatisfactory, and to my knowledge the only “standard” that addresses at least some of these aspect with at least some level of adequacy is setuptools.

You can put the fingers in your ears as a distribution maintainer and sing all you want, and it’s not going to make that fact disappear.

On December 14, 2008, Peter Fein said:

I’ve been complaining about these problems (and others) with setuptools for years to anyone who will listen. Let me also add that Phillip Eby (PJE), the author of setuptools, is the worst maintainer of an OSS project I have ever had the displeasure of interacting with.

I agree, the only major missing features of plain old distutils are dependencies and auto-downloading. Doesn’t seem like they should be that hard to add. Doing so would make a great PyCon sprint; might I suggest lazy-install as a name?

Come on people, this Python - we can do better than setuptools.

On December 14, 2008, Benjamin Schweizer said:

Some years ago, I’ve listened to a lecture about systems administration. I cannot recall all bits, but one quote is important to me: “administration together with the distribution”. It means, if there is a way the distributor expects you to do something, do it this way.

From there, I think i) non-developers should stick with the distro’s package manager as long as possible; ii) developers should make it easy for distributors to re-distribute their code and iii) package manager developers should not write into system directories.

p.s.: I’ll check out pip, now;-)

On December 14, 2008, James Bennett said:

@John Millikin: pkg_resources ends up being required because of how setuptools does things by default; this means that the packaging code ends up being application code, and that’s a no-no. You can call them separate things if you like, but the fact is that they’re distributed from the same place and work together to serve the same goals.

Meanwhile, eggs, for all you want to say about their “superiority”, are basically PJE’s attempt to port Java JAR files to a language which was never designed for them and which doesn’t have support for them. If you often desperately need something like that then I guess you should use setuptools, eggs and the associated libraries, but I’m OK with, for example, systems like Django which try to simplify complex situations by using a few layout conventions to know where to find things and assuming that Python source code is Python source code.

On December 15, 2008, Ian Bicking said:

I’ve replied in length to this article here: http://blog.ianbicking.org/2008/12/14/a-few-corrections-to-on-packaging/ - specifically I’ll note that pip is very much built on Setuptools. And perhaps it speaks to the fact that Setuptools is mostly good, with just a few bad parts, that when using pip you might not even notice Setuptools. I’ve used pip to route around some of the pieces of Setuptools that most annoy people (like zip files).

On December 15, 2008, mark said:

Seriously - why should any developer bother to make things compatible with specific distributions? I am much more in sympathy with arguments presented by (admittedly radical) people like tuomov here. I have NO sympathy whatsoever with distributions who do only cosmetic attempts to improve something for their users. Distributions fight each other for petty and pointless gains.

Linux market has an astonishing what - 3%? Wow… maybe in 10 years it will be 4%. And the desktop market will be 100% in 10000 years.

It was their problem (of distributions) in the first case (in case i.e. “gem” works) and frankly people who like distributions should NEVER leave the chains of them. That is just asking for trouble. Or do you think it is a good way to use apt-get and also configure/make/make install?

I swear people will have problems that way.

On December 15, 2008, Florian said:

There’s nice platform independent ways (gems, eggs, jars), to distribute software which works on nearly all systems. And then there’s the angry 2% market share linux distributor crowd that do shouting matches trying to shout down software developers attempting to write software for most platforms including theirs.

Here’s a message for you distributors.

Keep angry, keep shouting, keep your partisan broken ways, keep putting your fingers in your ears and sing “lalala”, meanwhile we (developers) move somewhere else.

On December 15, 2008, Jared said:

Linux market has an astonishing what - 3%? Wow… maybe in 10 years it will be 4%. And the desktop market will be 100% in 10000 years.

Uh, have you noticed the market share for servers?

On December 16, 2008, Mike said:

@Florian

Keep angry … and sing “lalala”, meanwhile we (developers) move somewhere else.

I don’t know what kind of developers you belong to, but we, the real world developers, want a manageable, predictable and preferably simple way of deploying our applications.

Thanks to the distributors, we have such a way.

On December 16, 2008, Florian said:

@Mike

I don’t think so. For instance, cp some.app Applications, now see, that’s simple, predictable and manageable. Double click setup.exe, yeah, not perfect but it works. Wee, see, I just covered 98% of the desktop market, and then there’s linux.

Ok, now how do I get for instance apt-get myapp to work? Alright, first off, that’ll not cover other linux that use rpm, bugger. But ohwell, I need some very new version of ODE, that isn’t available in the repositories of ubuntu, and it doesn’t look like the maintainers are going to put it there anytime soon. And on it goes…

You see, I think that when apt-get works, it’s fabulous. As a developer I’m just not alright to spend 90% of my packaging time wrestling with 2% of the market share of desktops. There you have it, that’s why software developers do platform agnostic deployments that don’t care jack what you pesky distributors do. Surprised?

On December 17, 2008, Mike said:

@Florian

I see what you mean. But the thing is I don’t care about the desktop market at all. I don’t care about clicking or dragging or dropping anything on the user’s desktop. All I am worried about is how to push this freaking lump of bytes to several different servers and make 100% sure it will work on each of them. We are really from different worlds :-)

On December 24, 2008, Chris Webstar said:

Software comes only one way in the linux world: Source. Stuff like apt, binary distributions, etc, came a long much later. And it does the same work setuptools does for python: easing things for those who don’t want to bother.

Windows on the other side, was never meant to be used by people who understand that programms are actually written, people that might actually care about source code. Just go ahead and try to compile something on a windows desktop, nd count the software packages you’ll have to install to make it happen.

There are worlds between windows users and linux users. Even between novice I-just-want-to-click linux users and regular-Joe windows users.

So why would there be a oh-magic way of uniformly installing stuff onto both of these platforms? It’s just not going to happen.

I for one find it vastly easier to setup python on a linux box (even without apt), than it is on windows. Anyone?

On January 8, 2009, Joe Gregorio said:

Wow, two years later and it doesn’t look like the situation has improved at all:

http://bitworking.org/news/Please_stop_using_setuptools__at_least_exclusively__for_now

Comments for this entry are closed. If you'd like to share your thoughts on this entry with me, please contact me directly.

ponybadge