Django, gzip and WSGI

May 21, 2006 Django, Frameworks, Pedantics

One of the many things I like about Django is the range of available middleware you can use to do all sorts of interesting stuff. But one in particular has got me a little bit stumped.

One of the available middleware components for Django allows content to be gzipped for output when the client specifies ‘gzip’ in its Accept-Encoding header; this is handy because it both conserves bandwidth and allows pages to be downloaded more quickly. Most popular web servers allow this (Apache via mod_deflate, lighttpd via mod_compress, etc.), but it’s handy to have in Django in case you’re using a server that doesn’t have this functionality or with a hosting provider who doesn’t enable the necessary module for you.

But there’s a problem with that. Maybe.

I’m running Django with lighttpd and FastCGI via flup, which implements a pure-Python WSGI-compliant FastCGI server; lighttpd talks to flup via FastCGI, and flup talks to Django via WSGI. The problem is this portion of the WSGI spec:

Note: applications and middleware must not apply any kind of Transfer-Encoding to their output, such as chunking or gzipping; as “hop-by-hop” operations, these encodings are the province of the actual web server/gateway.

Whoops, looks like I’m not allowed to gzip in Django. Which isn’t a great loss because, of course, lighttpd offers a module to do this, and so does flup — since flup is a WSGI server, it’s allowed to do that sort of thing.

But I say it looks like I’m not allowed to gzip in Django, because this isn’t entirely clear. The WSGI spec is using chunking and gzipping as examples, and saying that what’s really explicitly forbidden are “hop-by-hop” operations. There’s a reference to a different section of the WSGI spec, which attempts to clarify this:

However, because WSGI servers and applications do not communicate via HTTP, what RFC 2616 calls “hop-by-hop” headers do not apply to WSGI internal communications. WSGI applications must not generate any “hop-by-hop” headers [4], attempt to use HTTP features that would require them to generate such headers, or rely on the content of any incoming “hop-by-hop” headers in the environ dictionary. WSGI servers must handle any supported inbound “hop-by-hop” headers on their own, such as by decoding any inbound Transfer-Encoding, including chunked encoding if applicable.

OK, so what’s really explicitly forbidden is for a WSGI application to set any “hop-by-hop” headers; only WSGI servers are allowed to do that. And what does HTTP consider to be a “hop-by-hop” header? Well, that’s spelled out in section 13.5.1 of RFC2616:

The following HTTP/1.1 headers are hop-by-hop headers:

 - Connection
 - Keep-Alive
 - Proxy-Authenticate
 - Proxy-Authorization
 - TE
 - Trailers
 - Transfer-Encoding
 - Upgrade

OK, that’s consistent with WSGI saying that you can’t set Transfer-Encoding. But Django’s GZipMiddleware doesn’t set Transfer-Encoding, it only sets Content-Encoding; it’s not gzipping the entire response message, so setting Transfer-Encoding, even if WSGI permitted it, would be inappropriate.

And Content-Encoding isn’t a “hop-by-hop” header.

So now I’m pretty confused; Malcolm and I talked this over the other night on IRC but couldn’t come to any solid conclusions about whether it’s acceptable to use Django’s GZipMiddleware when Django is sitting behind a WSGI server. And Googling didn’t turn up anything particularly useful; there are lots of WSGI middlewares out there which do gzipping, but the only discussion of this particular part of the WSGI spec I could find was this message on Web-SIG where Phillip Eby seems to be more concerned with chunked transfer and says he’ll clarify the spec to mention that applications are not permitted to set “hop-by-hop” headers. Which, as far as I can tell, doesn’t really clarify this particular situation.

Help?