Batteries sold separately

At first glance, Google’s App Engine looks like a great way to build the next big web application; you get access to a massively scalable infrastructure, you get access to a huge existing authentication system, you get baked-in stats, you get all sorts of cool goodies.

Oh, and you get Python, which is a great language for writing web applications, and I’d be remiss if I didn’t take some pleasure in Django being available out of the box.

Personally I don’t really care one way or another about hosting code with Google, or letting a data store sit on their servers; the terms of service, if you read them, are surprisingly reasonable, and you don’t hand over any rights to peek at user data by hosting an application with Google, so that’s a non-issue.

But…

The overwhelming strength of Python is its libraries. Not just the standard modules that come with Python itself, but the whole ecosystem of third-party stuff that makes Python so incredibly useful for writing web applications. And that’s where, as far as I can tell, App Engine falls over. Unless I’m missing something, Google’s URL fetch module is the only way you’re allowed to talk to the rest of the Web, and that pretty much sinks the platform.

I understand that they need to sandbox things for safety, but cutting off the standard Python modules for doing URL retriveal and speaking HTTP throws out an unbelievably large amount of software that you’ll now either have to rewrite or fork:

  • Want to use Akismet to filter spam submissions? Better come up with a wrapper that uses Google’s fetch API.
  • Want to sync to popular services like Flickr or del.icio.us? Yup, gonna have to put that together yourself.
  • Want to use the API of the hot new Web 2.0 property? You guessed it: existing Python wrappers aren’t going to work.
  • Irony: want to use existing Python modules that talk to Google’s web services? Whoops.

The list just goes on and on; all this stuff needs to either be rewritten to use Google’s API, or needs to be forked and patched. And it seems you can just forget about anything that isn’t doing HTTP. And that’s just the tip of the iceberg: it looks like a simply vast amount of useful Python software is going to be verboten on App Engine.

I have a very hard time believing, if this is how it works, that it’ll really be useful for Python web developers. And if/when other languages are supported on the platform, they’ll have similar problems.

Don’t get me wrong: I’m not saying that a free-to-start-with, massively-scalable service for hosting web applications isn’t cool. I’m just not sure that’s what Google App Engine is; they’ve got the scalable bit and the hosting bit, but there’s a surprising lack of, well, “web” and “application” going on here.

Comments

Mayuresh
April 8, 2008
#

Good to see a post from you after a long time …

Ramin
April 8, 2008
#

A full list of what libraries are supported and what’s not is here.

The only reason I can think of for them to take out all the url/http/socket stuff is to stop the AppEngine from being used as a platform for remote proxy attacks. But They could achieve the same thing by building a custom version of urllib/urllib2 with their throttling code down at the C code.

In any case I filed a ticket request. Anyone who feels like it can go add their support.

It’s all pretty early stage. Maybe they’ll make the change.

Jones
April 8, 2008
#

Dead-On, as usual. I also fail to see what kind of applications we’re supposed to build without any contact to the outside world other than plain HTTP.

This is a whole new definition of a “golden ivory tower”. You get a set of amazingly powerful and scalable tools, but no sane way to integrate them with anything outside of your playground.

Sure, I could build my “next big thing” exclusively on google infrastructure. But if I really were to build a “next big thing” then I’m not so sure I’d want to rely on a third party to that extend…

Ian Bicking
April 8, 2008
#

For some reason they seem to have turned the desire to avoid the socket module into taking out the stdlib modules that use socket, instead of reimplementing them. I hope this is just a temporary thing, as I can’t see any reason the urllib and urllib2 modules wouldn’t work fine, and most of httplib should work too (even if the implementation of that will change). OTOH, if they don’t do it, I’m sure someone else will in short order.

Ian Bicking
April 9, 2008
#

There’s a ticket for this: http://code.google.com/p/googleappengine/issues/detail?id=61

Readers can indicate interest in the ticket by starring it. (There’s way to many noisy +1 comments: please don’t add more!)

Cam MacRae
April 9, 2008
#

@Ian

I’m not sure it’s intended to be a temporary thing given the URL Fetch API - I hope the weight of the community will be enough to swing them.

Luke Hoersten
April 9, 2008
#

Lets say you want to do a ton of parallel computations for complex stats or something. Normally, because of the GIL, you’d have to use something like C or Erlang to do this. Is that possible with GAE?

Steve McKay
April 9, 2008
#

@Luke

AppEngine forbids spawning threads or subprocesses, and requires that each request complete within “a few seconds”.

Mike
April 9, 2008
#

I think a point of alot of these restrictions is to allow these services to scale very well across a series of cheap computers, using sockets or accessing the filesystem would tie the execution of your code to one computer, this is probably the rationale behind running a process for each individual request also.

It’s unlikedly i will be using the app engine but it does make a large ammount of sense, it seems to be aimed at building massively scaleable applications with fewer infastructure headaches. The problem i can see is that anyone creating a high traffic rich internet application probably isn’t going to be delighted at the prospect of a google lockin.

Tom Davies
April 9, 2008
#

@Luke — I think Amazon’s EC2 would be a better fit for what you want to do.

Alec
April 9, 2008
#

The limitations on outbound connections are probably also related to spam and DoS prevention. Imagine spammers using Google infrastructure to send their mails…

Andres
April 9, 2008
#

Read the tutorial, it is possible to use ANY existing python library, provided it does not use any of the standard libraries mentioned above. To use a 3rd party library, you just have to copy it to your application directory…

Arnar
April 9, 2008
#

As far as I can tell, their HTTP API is dead simple, so porting a library for a web 2.0 service (e.g. for Flickr) should be relatively easy. One could probably also write a limited wrapper on top of their api that behaves (at least) partially like urllib.

David T.
April 9, 2008
#

Haha, if it were Ruby, I’d just overrite net/http. Eat that, python guys :P

Chris Adams
April 9, 2008
#

Google’s web services have been updated - that should provide a good indication as to the amount of work involved switching to their HTTP library for most services. I’d be surprised if someone didn’t come up with a monkey-patch for most httplib usage, too.

OJT
April 9, 2008
#
  1. What part of “beta release” is not clear?
  2. I guess they will release a urllib emulation module soon.
James Bennett
April 9, 2008
#

Andres, I did read the tutorial. I also read the bit where anything that opens a socket is forbidden. That means standard Python networking modules are forbidden; the only way around it would be, as Ian suggested, to write modules which emulate the behavior but call on Google’s “approved” module under the hood.

OJT, nice to see that apparently beta releases are meant to be immune from criticism; I was wondering how long until somebody tried to throw that at me. Also, they did release a module that does URL fetching, it just has a different name from the standard Python library and exposes a completely incompatible API.

OJT
April 9, 2008
#

I never said beta software is immune from criticism.

If your criticism was, for example, about the fact that Google uses a proprietary non-SQL database API and that this serves to lock-in users to their system then it would be very relevant. That’s a fundamental part of the system and is not likely to change as a result of feedback from beta testers.

But the url fetching API can easily be fixed in the next releases so spending too much time criticising such a trivial detail in a beta release seems rather pointless.

James Bennett
April 9, 2008
#

If your criticism was, for example, about the fact that Google uses a proprietary non-SQL database API and that this serves to lock-in users to their system then it would be very relevant.

Except that’s not “lock-in”; that phrase implies there’s no way for me to get my data out if I’m using their system. It’d be incredibly easy to get the data out, and their query language is close enough to what I already use — the Django ORM — that I’m not hurting if I start an app on Google’s service and then migrate away.

But the url fetching API can easily be fixed in the next releases so spending too much time criticising such a trivial detail in a beta release seems rather pointless.

Trivial? This is a web application platform that can’t integrate with the overwhelming majority of useful software designed for use by web applications. That needed to be fixed before it ever went into a public preview, not in a nebulous “next release”.

Joe Grossberg
April 10, 2008
#

The overwhelming strength of Python is its libraries.”

Yikes; that’s quite a claim there — that it’s “the strength”, not “a strength”, and overwhelmingly so, at that.

I don’t think it’s the “better libraries” that won people over from Perl and Java.

Ramin
April 10, 2008
#

Response from Guido on AppEngine urllib/urllib2 feature request:

Providing a urllib replacement implemented on top of urlfetch shouldn’t be particularly hard. If someone is willing to produce one, I’d be happy to review it and, if it passes muster, try to get it added.

For parts of urllib2 this will be harder; the Request/Response/Handler/OpenerDirector architecture there isn’t easily portable to the urlfetch API, which simply makes an RPC to another server that handles the entire request and returns the complete response. But I’m open for suggestions here.

However, I don’t have the time to do all the work myself — all I can offer is to review contributions and try to get them added. (Legal will have a say too; I expect that as long as it’s non-GPL open source, e.g. the Apache or BSD license, it shouldn’t be a problem.)

James Bennett
April 10, 2008
#

@Joe: For web application development, yup, you bet your ass that library support is the killer feature for Python. No other language I’ve used (and I’ve used Perl professionally, and dabbled with Java) has the same wealth of useful, high-quality and above all well-documented web-oriented libraries. Nothing else compares, period.

jorge vargas
April 11, 2008
#

@David T, please understand what you are commenting on before commenting, you CAN overwrite modules in python, problem is that the module doesn’t exists, as in the code was deleted and the python that is running doesn’t has it.

Add a comment

You may use Markdown syntax in your comment, but raw HTML will be removed. By posting a comment here, you are agreeing to the terms of my comment policy.