Batteries sold separately

An entry published by James Bennett on April 8, 2008, Part of the categories Misc, Python and Usability. 23 comments posted.

At first glance, Google’s App Engine looks like a great way to build the next big web application; you get access to a massively scalable infrastructure, you get access to a huge existing authentication system, you get baked-in stats, you get all sorts of cool goodies.

Oh, and you get Python, which is a great language for writing web applications, and I’d be remiss if I didn’t take some pleasure in Django being available out of the box.

Personally I don’t really care one way or another about hosting code with Google, or letting a data store sit on their servers; the terms of service, if you read them, are surprisingly reasonable, and you don’t hand over any rights to peek at user data by hosting an application with Google, so that’s a non-issue.

But…

The overwhelming strength of Python is its libraries. Not just the standard modules that come with Python itself, but the whole ecosystem of third-party stuff that makes Python so incredibly useful for writing web applications. And that’s where, as far as I can tell, App Engine falls over. Unless I’m missing something, Google’s URL fetch module is the only way you’re allowed to talk to the rest of the Web, and that pretty much sinks the platform.

I understand that they need to sandbox things for safety, but cutting off the standard Python modules for doing URL retriveal and speaking HTTP throws out an unbelievably large amount of software that you’ll now either have to rewrite or fork:

The list just goes on and on; all this stuff needs to either be rewritten to use Google’s API, or needs to be forked and patched. And it seems you can just forget about anything that isn’t doing HTTP. And that’s just the tip of the iceberg: it looks like a simply vast amount of useful Python software is going to be verboten on App Engine.

I have a very hard time believing, if this is how it works, that it’ll really be useful for Python web developers. And if/when other languages are supported on the platform, they’ll have similar problems.

Don’t get me wrong: I’m not saying that a free-to-start-with, massively-scalable service for hosting web applications isn’t cool. I’m just not sure that’s what Google App Engine is; they’ve got the scalable bit and the hosting bit, but there’s a surprising lack of, well, “web” and “application” going on here.

On April 8, 2008, Mayuresh said:

Good to see a post from you after a long time …

On April 8, 2008, Ramin said:

A full list of what libraries are supported and what’s not is here.

The only reason I can think of for them to take out all the url/http/socket stuff is to stop the AppEngine from being used as a platform for remote proxy attacks. But They could achieve the same thing by building a custom version of urllib/urllib2 with their throttling code down at the C code.

In any case I filed a ticket request. Anyone who feels like it can go add their support.

It’s all pretty early stage. Maybe they’ll make the change.

On April 8, 2008, Jones said:

Dead-On, as usual. I also fail to see what kind of applications we’re supposed to build without any contact to the outside world other than plain HTTP.

This is a whole new definition of a “golden ivory tower”. You get a set of amazingly powerful and scalable tools, but no sane way to integrate them with anything outside of your playground.

Sure, I could build my “next big thing” exclusively on google infrastructure. But if I really were to build a “next big thing” then I’m not so sure I’d want to rely on a third party to that extend…

On April 8, 2008, Ian Bicking said:

For some reason they seem to have turned the desire to avoid the socket module into taking out the stdlib modules that use socket, instead of reimplementing them. I hope this is just a temporary thing, as I can’t see any reason the urllib and urllib2 modules wouldn’t work fine, and most of httplib should work too (even if the implementation of that will change). OTOH, if they don’t do it, I’m sure someone else will in short order.

On April 9, 2008, Ian Bicking said:

There’s a ticket for this: http://code.google.com/p/googleappengine/issues/detail?id=61

Readers can indicate interest in the ticket by starring it. (There’s way to many noisy +1 comments: please don’t add more!)

On April 9, 2008, Cam MacRae said:

@Ian

I’m not sure it’s intended to be a temporary thing given the URL Fetch API - I hope the weight of the community will be enough to swing them.

On April 9, 2008, Luke Hoersten said:

Lets say you want to do a ton of parallel computations for complex stats or something. Normally, because of the GIL, you’d have to use something like C or Erlang to do this. Is that possible with GAE?

On April 9, 2008, Steve McKay said:

@Luke

AppEngine forbids spawning threads or subprocesses, and requires that each request complete within “a few seconds”.

On April 9, 2008, Mike said:

I think a point of alot of these restrictions is to allow these services to scale very well across a series of cheap computers, using sockets or accessing the filesystem would tie the execution of your code to one computer, this is probably the rationale behind running a process for each individual request also.

It’s unlikedly i will be using the app engine but it does make a large ammount of sense, it seems to be aimed at building massively scaleable applications with fewer infastructure headaches. The problem i can see is that anyone creating a high traffic rich internet application probably isn’t going to be delighted at the prospect of a google lockin.

On April 9, 2008, Tom Davies said:

@Luke — I think Amazon’s EC2 would be a better fit for what you want to do.

On April 9, 2008, Alec said:

The limitations on outbound connections are probably also related to spam and DoS prevention. Imagine spammers using Google infrastructure to send their mails…

On April 9, 2008, Andres said:

Read the tutorial, it is possible to use ANY existing python library, provided it does not use any of the standard libraries mentioned above. To use a 3rd party library, you just have to copy it to your application directory…

On April 9, 2008, Arnar said:

As far as I can tell, their HTTP API is dead simple, so porting a library for a web 2.0 service (e.g. for Flickr) should be relatively easy. One could probably also write a limited wrapper on top of their api that behaves (at least) partially like urllib.

On April 9, 2008, David T. said:

Haha, if it were Ruby, I’d just overrite net/http. Eat that, python guys :P

On April 9, 2008, Chris Adams said:

Google’s web services have been updated - that should provide a good indication as to the amount of work involved switching to their HTTP library for most services. I’d be surprised if someone didn’t come up with a monkey-patch for most httplib usage, too.

On April 9, 2008, OJT said:
  1. What part of “beta release” is not clear?
  2. I guess they will release a urllib emulation module soon.
On April 9, 2008, James Bennett said:

Andres, I did read the tutorial. I also read the bit where anything that opens a socket is forbidden. That means standard Python networking modules are forbidden; the only way around it would be, as Ian suggested, to write modules which emulate the behavior but call on Google’s “approved” module under the hood.

OJT, nice to see that apparently beta releases are meant to be immune from criticism; I was wondering how long until somebody tried to throw that at me. Also, they did release a module that does URL fetching, it just has a different name from the standard Python library and exposes a completely incompatible API.

On April 9, 2008, OJT said:

I never said beta software is immune from criticism.

If your criticism was, for example, about the fact that Google uses a proprietary non-SQL database API and that this serves to lock-in users to their system then it would be very relevant. That’s a fundamental part of the system and is not likely to change as a result of feedback from beta testers.

But the url fetching API can easily be fixed in the next releases so spending too much time criticising such a trivial detail in a beta release seems rather pointless.

On April 9, 2008, James Bennett said:

If your criticism was, for example, about the fact that Google uses a proprietary non-SQL database API and that this serves to lock-in users to their system then it would be very relevant.

Except that’s not “lock-in”; that phrase implies there’s no way for me to get my data out if I’m using their system. It’d be incredibly easy to get the data out, and their query language is close enough to what I already use — the Django ORM — that I’m not hurting if I start an app on Google’s service and then migrate away.

But the url fetching API can easily be fixed in the next releases so spending too much time criticising such a trivial detail in a beta release seems rather pointless.

Trivial? This is a web application platform that can’t integrate with the overwhelming majority of useful software designed for use by web applications. That needed to be fixed before it ever went into a public preview, not in a nebulous “next release”.

On April 10, 2008, Joe Grossberg said:

The overwhelming strength of Python is its libraries.”

Yikes; that’s quite a claim there — that it’s “the strength”, not “a strength”, and overwhelmingly so, at that.

I don’t think it’s the “better libraries” that won people over from Perl and Java.

On April 10, 2008, Ramin said:

Response from Guido on AppEngine urllib/urllib2 feature request:

Providing a urllib replacement implemented on top of urlfetch shouldn’t be particularly hard. If someone is willing to produce one, I’d be happy to review it and, if it passes muster, try to get it added.

For parts of urllib2 this will be harder; the Request/Response/Handler/OpenerDirector architecture there isn’t easily portable to the urlfetch API, which simply makes an RPC to another server that handles the entire request and returns the complete response. But I’m open for suggestions here.

However, I don’t have the time to do all the work myself — all I can offer is to review contributions and try to get them added. (Legal will have a say too; I expect that as long as it’s non-GPL open source, e.g. the Apache or BSD license, it shouldn’t be a problem.)

On April 10, 2008, James Bennett said:

@Joe: For web application development, yup, you bet your ass that library support is the killer feature for Python. No other language I’ve used (and I’ve used Perl professionally, and dabbled with Java) has the same wealth of useful, high-quality and above all well-documented web-oriented libraries. Nothing else compares, period.

On April 11, 2008, jorge vargas said:

@David T, please understand what you are commenting on before commenting, you CAN overwrite modules in python, problem is that the module doesn’t exists, as in the code was deleted and the python that is running doesn’t has it.

Comments for this entry are closed. If you'd like to share your thoughts on this entry with me, please contact me directly.

ponybadge