Let’s talk about WSGI

August 10, 2009 Django, Frameworks, Python

Recently Armin Ronacher (whose blog you should be reading if you do anything at all involving Python and the web) has published a couple of good articles poking at the current state of WSGI, the standard interface for Python web applications. Some of his comments dovetail nicely into concerns I’ve been trying to put into words for a while now, so I’m glad he’s posting on the subject and providing some context.

In short, I’ve come to have some rather severe misgivings about WSGI — both as currently constituted and as it’s likely to be in the future — which break down into three areas:

As an interface for Python applications to speak HTTP, WSGI leaves quite a lot to be desired.
As an interface for Python web developers to implement and work with, WSGI is antiquated and frustrating.
As an interface for composing useful functionality from the various Python web components, WSGI is far from sufficient.

So. Allow me to explain.

HTTP is hard, let’s go shopping!

My biggest gripe with WSGI from an HTTP standpoint is simply that it seems to have a track record of muddling or outright punting on some of the more interesting/complex features of HTTP. One simple example is the common optimization of compressing (via gzip) the outgoing HTTP response, which usually provides a significant boost in performance as seen by end users of web applications. WSGI seems at first to forbid applications from applying this optimization:

Note: applications and middleware must not apply any kind of Transfer-Encoding to their output, such as chunking or gzipping; as “hop-by-hop” operations, these encodings are the province of the actual web server/gateway.

But gzipping of the response body is indicated by the Content-Encoding header, which per the HTTP spec is not a “hop-by-hop” header. So is a WSGI application or middleware allowed to gzip an outgoing response body? I wish I knew. Django ships an optional middleware class which applies gzipping to suitable outgoing responses. The Pylons book provides a detailed example of how to write a WSGI middleware for gzipping and seems to encourage its use. And I know there are other implementations of this feature in the wild. But I have a sneaking suspicion that WSGI intends to forbid this feature to any part of the stack other than the server, which would greatly diminish its utility (applications and middlewares are far more likely to have access to information which lets them make reliable judgments about when to apply gzipping, while servers can only apply some relatively blind heuristics).

Meanwhile, chunked transfer — which does involve an actual hop-by-hop header — is clearly forbidden, even though HTTP handles it just fine and it’s an integral part of certain techniques for long-polling (e.g., Comet) applications. As such, the spec basically derails the idea of building those types of applications within WSGI. When I mentioned this in passing in a thread on the Python web-sig list it was suggested that servers could simply look for certain signs that a response should be chunked, but that’s insufficient: for one thing, it requires the server to make guesses about what the application author meant to do. For another, it presupposes that any middlewares involved in the request/response cycle will implement the same heuristics and avoid attempts to consume the response body (since it’s likely that the response body will be some sort of iterable which can only be consumed once, or which poses a threat of prohibitive resource use or a gateway timeout if consumed all in one go).

If WSGI applications could use the Transfer-Encoding header we’d have an easy way to signal this to servers and middlewares, and although the implementation still wouldn’t be simple it would at least have the capability to be much more reliable. Django currently has several open tickets related to this very issue, and I wouldn’t be surprised if other libraries and frameworks face similar problems.

For one final example I’ll pick on a genuinely hard problem which I’ll be harping on again in a bit: character encoding. The WSGI spec impresses upon its readers (or upon this reader, at least) the overwhelming desire for everybody to just quiet down and use ISO-8859-1 instead of whatever character set is actually convenient. Although it does go so far as to mimic HTTP’s ability to use MIME-encoding for non-latin-1 characters, its requirements regarding string types then go on to place heavy burdens not only on Python implementations which have native Unicode strings, but also on applications which might want to do useful things like, say, implement a subclass of str which knows whether it needs to be HTML-escaped or not (which bit Django when we implemented auto-escaping; the “solution” is a throwaway upcast to str to satisfy the WSGI spec’s overzealous type checking).

Problems like these leave me with the feeling that WSGI simply isn’t up to the job of providing Python’s “one obvious way to do” HTTP.

Welcome to 1997

Meanwhile, as an interface for programmers to actually implement and work with in their applications, WSGI feels like nothing so much as a blast from the past, and is designed from the ground up to give this impression. The original standard for web programming was, of course, CGI, and its programming model looked like this:

The server sets up an environment containing variables which indicate various aspects of the request.
The server invokes the CGI program in that environment.
The server passes the request body (if any) to the program on standard input.
The program uses the CGI environment variables and the standard input (if any) to figure out what to do.
The program spits the status line and headers to standard output, followed by a separator, followed by the response body.
The server reads this and turns it into an HTTP response back to the client.

WSGI’s programming model, meanwhile, is as follows:

The server sets up a dictionary — called environ — containing keys which indicate various aspects of the request.
The server invokes the WSGI application (a Python callable), passing it the environ.
The server provides the application with a callable to use to signal the return status and headers.
The application uses the WSGI environ to figure out what to do.
The application spits the status line and headers out through the supplied callable, then begins yielding the response body.
The server reads this and turns it into an HTTP response back to the client.

The parallels here are deliberate: at heart, WSGI is CGI, and this is allegedly a good thing. But it means that WSGI inherits a collection of pathological edge cases, and lays them all squarely at the application author’s feet.

If there were a good, solid, standard implementation of WSGI request parsing that everybody used (say, in a module in the Python standard library), this wouldn’t be as big a problem. But the reference implementation of WSGI is incomplete, and the bits of the standard library which would obviously supplement it have issues (once again, Armin provides useful explanations).

In a way, WSGI has been a victim of its own success. It’s been so heavily promoted as an easy and standard way to implement Python web applications that many developers have simply jumped in head-first and rolled their own solutions, which often turn out to be incomplete or incorrect: the intersection of HTTP, web server quirks, CGI backwards-compatibility and the Python standard library is full of “fun” situations which make the development of solid, reliable WSGI stacks far more complex and subtle than the marketing materials would have you believe. The result is that while there are a few good (as in, mostly complete and free of major bugs) implementations, most people aren’t using them. And even if people tried to use them, dependencies between third-party Python packages are a whole ‘nother world of pain.

But that’s really just the tip of the iceberg. WSGI’s insistence on the CGI programming model, for example, means that non-trivial WSGI stacks have to burn a lot of cycles doing useless work, since every application and every middleware in the chain has to do its own parsing of the environ when invoked. Parsing once and handing the result down the stack is not how WSGI is meant to work.

WSGI’s curious insistence on compatibility with CGI also means that, here in 2009, the Python web-development world still hasn’t been able to significantly improve on 1997’s application programming model. Various libraries and frameworks have implemented useful, normalized abstractions and simpler object-oriented APIs for HTTP requests and responses, but adherence to WSGI rules out (in the practical sense that arbitrary WSGI components can’t be relied on to have any knowledge of these abstractions or APIs) meaningful reuse and interoperability. As a result, the only thing we can really count on is an interface that’s so low-level and complex that the first thing most people do is try to hide it under something easier to work with.

And, of course, character encoding rears its ugly head again. WSGI requires that (in Python 2.x) everything be of type str or StringType. In other words, bytestrings. I’ve heard that some popular libraries and frameworks do their best to handle encoding quirks and just let application authors deal with Unicode (translating to/from bytestrings at the boundaries), but once again it’s something that has to be handled repetitively and independently at each point in the processing chain. Current discussion on the future of WSGI seems to be favoring a continuation of this “feature” into Python 3.x, where it will grow from being incredibly annoying to being downright dangerous, since Python 3.x does not allow you to be as promiscuous about mixing Unicode and bytes as Python 2.x. The result, if implemented, is likely to be lots of preventable type errors and lots of subtler, hard-to-diagnose problems resulting from the mismatch of a Unicode-based string type and incompatible byte-based WSGI environments.

You got framework in my gateway interface!

Finally, WSGI simply cannot live up to what’s expected of it in terms of providing an interface which allows arbitrary Python web components to be interoperable enough to compose useful applications. While this is not technically a goal of WSGI — which aims simply to provide a way for Python applications to speak HTTP — it’s become something of a major focus in the last couple of years. Given that the same thing seems to be happening elsewhere (e.g., with Ruby and Rack), I suspect there’s a corollary to Zawinski’s Law at work here: every gateway interface expands until it looks sort of like a framework API.

The problem is that WSGI isn’t and never will be a framework API, and attempts to use it in place of one are pretty much doomed to eternal complication. One obvious issue is that the only way to add additional processing in WSGI is by introducing middlewares between the server and the application, creating a sort of onion-skin model where a request passes from the server, through one or more layers of middleware, finally arriving at the application which emits a response, which goes back through the middlewares and out to the server. This sounds like a workable model, but it really isn’t, and Django has learned that lesson the hard way: we have an onion-skin middleware system, and it’s resulted in things which are one logical unit of functionality being broken up into separate physical chunks of code because otherwise you get into catch-22 situations where there’s no one order of middlewares that will do what you want (e.g., Middleware A needs to be invoked before Middleware B in both request and response processing).

This is, I think, why some sort of “lifecycle” method API is a frequent request for WSGI. Having an officially-blessed way to write some code and insert it at a precise point in the processing chain would be incredibly useful and open up a lot of functionality that’s either difficult or impossible to obtain right now.

A deeper issue is that the only official way for servers, middlewares and applications to communicate with each other is by passing around the WSGI environ (on the incoming request) or the HTTP headers (on the outgoing response). This opens up a couple nasty cans of worms:

If middlewares or applications pass information by mutating the environ or headers, we face a haphazard situation where different components can unknowingly trample on each other’s information by making successive changes to the same keys.
If middlewares or applications pass information by adding keys to the environ or adding new headers, we avoid the trampling issue but now have to tightly couple middlewares and applications in order to make sure everything knows which additional keys or headers to look for and what they mean.

Trying to go outside WSGI by setting up side channels doesn’t help with either of these problems. And the only other solution seems to be monkeypatching (for example, if a middleware wants to signal to some component that it should enter a non-default or debugging mode, it could reach in and directly tweak some bit of the relevant code), but that’s a “solution” that’s likely to be worse than the problem.

What to do?

Unfortunately, I don’t think any of these problems have simple solutions. An easier-said-than-done summary corresponding to my main bullet points might be:

Make WSGI a more thorough implementation of HTTP.
Develop a complete, correct and efficient implementation of the relevant bits and get them in the standard library.
Get smart people from all the frameworks in a room together and hammer out some standards, IETF-style, from things that are known to work in the real world.

Of course, I could be completely wrong about all of this, and if I am I’m sure someone will helpfully point that out or offer alternate ideas. So if you’ve got ‘em, fire away :)