Batteries sold separately
At first glance, Google’s App Engine looks like a great way to build the next big web application; you get access to a massively scalable infrastructure, you get access to a huge existing authentication system, you get baked-in stats, you get all sorts of cool goodies.
Oh, and you get Python, which is a great language for writing web applications, and I’d be remiss if I didn’t take some pleasure in Django being available out of the box.
Personally I don’t really care one way or another about hosting code with Google, or letting a data store sit on their servers; the terms of service, if you read them, are surprisingly reasonable, and you don’t hand over any rights to peek at user data by hosting an application with Google, so that’s a non-issue.
But…
The overwhelming strength of Python is its libraries. Not just the standard modules that come with Python itself, but the whole ecosystem of third-party stuff that makes Python so incredibly useful for writing web applications. And that’s where, as far as I can tell, App Engine falls over. Unless I’m missing something, Google’s URL fetch module is the only way you’re allowed to talk to the rest of the Web, and that pretty much sinks the platform.
I understand that they need to sandbox things for safety, but cutting off the standard Python modules for doing URL retriveal and speaking HTTP throws out an unbelievably large amount of software that you’ll now either have to rewrite or fork:
- Want to use Akismet to filter spam submissions? Better come up with a wrapper that uses Google’s fetch API.
- Want to sync to popular services like Flickr or del.icio.us? Yup, gonna have to put that together yourself.
- Want to use the API of the hot new Web 2.0 property? You guessed it: existing Python wrappers aren’t going to work.
- Irony: want to use existing Python modules that talk to Google’s web services? Whoops.
The list just goes on and on; all this stuff needs to either be rewritten to use Google’s API, or needs to be forked and patched. And it seems you can just forget about anything that isn’t doing HTTP. And that’s just the tip of the iceberg: it looks like a simply vast amount of useful Python software is going to be verboten on App Engine.
I have a very hard time believing, if this is how it works, that it’ll really be useful for Python web developers. And if/when other languages are supported on the platform, they’ll have similar problems.
Don’t get me wrong: I’m not saying that a free-to-start-with, massively-scalable service for hosting web applications isn’t cool. I’m just not sure that’s what Google App Engine is; they’ve got the scalable bit and the hosting bit, but there’s a surprising lack of, well, “web” and “application” going on here.
April 8, 2008
#
Good to see a post from you after a long time …
April 8, 2008
#
A full list of what libraries are supported and what’s not is here.
The only reason I can think of for them to take out all the url/http/socket stuff is to stop the AppEngine from being used as a platform for remote proxy attacks. But They could achieve the same thing by building a custom version of urllib/urllib2 with their throttling code down at the C code.
In any case I filed a ticket request. Anyone who feels like it can go add their support.
It’s all pretty early stage. Maybe they’ll make the change.
April 8, 2008
#
Dead-On, as usual. I also fail to see what kind of applications we’re supposed to build without any contact to the outside world other than plain HTTP.
This is a whole new definition of a “golden ivory tower”. You get a set of amazingly powerful and scalable tools, but no sane way to integrate them with anything outside of your playground.
Sure, I could build my “next big thing” exclusively on google infrastructure. But if I really were to build a “next big thing” then I’m not so sure I’d want to rely on a third party to that extend…
April 8, 2008
#
For some reason they seem to have turned the desire to avoid the socket module into taking out the stdlib modules that use socket, instead of reimplementing them. I hope this is just a temporary thing, as I can’t see any reason the urllib and urllib2 modules wouldn’t work fine, and most of httplib should work too (even if the implementation of that will change). OTOH, if they don’t do it, I’m sure someone else will in short order.
April 9, 2008
#
There’s a ticket for this: http://code.google.com/p/googleappengine/issues/detail?id=61
Readers can indicate interest in the ticket by starring it. (There’s way to many noisy +1 comments: please don’t add more!)
April 9, 2008
#
@Ian
I’m not sure it’s intended to be a temporary thing given the URL Fetch API - I hope the weight of the community will be enough to swing them.
April 9, 2008
#
Lets say you want to do a ton of parallel computations for complex stats or something. Normally, because of the GIL, you’d have to use something like C or Erlang to do this. Is that possible with GAE?
April 9, 2008
#
@Luke
AppEngine forbids spawning threads or subprocesses, and requires that each request complete within “a few seconds”.
April 9, 2008
#
I think a point of alot of these restrictions is to allow these services to scale very well across a series of cheap computers, using sockets or accessing the filesystem would tie the execution of your code to one computer, this is probably the rationale behind running a process for each individual request also.
It’s unlikedly i will be using the app engine but it does make a large ammount of sense, it seems to be aimed at building massively scaleable applications with fewer infastructure headaches. The problem i can see is that anyone creating a high traffic rich internet application probably isn’t going to be delighted at the prospect of a google lockin.
April 9, 2008
#
@Luke — I think Amazon’s EC2 would be a better fit for what you want to do.
April 9, 2008
#
The limitations on outbound connections are probably also related to spam and DoS prevention. Imagine spammers using Google infrastructure to send their mails…
April 9, 2008
#
Read the tutorial, it is possible to use ANY existing python library, provided it does not use any of the standard libraries mentioned above. To use a 3rd party library, you just have to copy it to your application directory…
April 9, 2008
#
As far as I can tell, their HTTP API is dead simple, so porting a library for a web 2.0 service (e.g. for Flickr) should be relatively easy. One could probably also write a limited wrapper on top of their api that behaves (at least) partially like urllib.
April 9, 2008
#
Haha, if it were Ruby, I’d just overrite net/http. Eat that, python guys :P
April 9, 2008
#
Google’s web services have been updated - that should provide a good indication as to the amount of work involved switching to their HTTP library for most services. I’d be surprised if someone didn’t come up with a monkey-patch for most httplib usage, too.
April 9, 2008
#
April 9, 2008
#
Andres, I did read the tutorial. I also read the bit where anything that opens a socket is forbidden. That means standard Python networking modules are forbidden; the only way around it would be, as Ian suggested, to write modules which emulate the behavior but call on Google’s “approved” module under the hood.
OJT, nice to see that apparently beta releases are meant to be immune from criticism; I was wondering how long until somebody tried to throw that at me. Also, they did release a module that does URL fetching, it just has a different name from the standard Python library and exposes a completely incompatible API.
April 9, 2008
#
I never said beta software is immune from criticism.
If your criticism was, for example, about the fact that Google uses a proprietary non-SQL database API and that this serves to lock-in users to their system then it would be very relevant. That’s a fundamental part of the system and is not likely to change as a result of feedback from beta testers.
But the url fetching API can easily be fixed in the next releases so spending too much time criticising such a trivial detail in a beta release seems rather pointless.
April 9, 2008
#
Except that’s not “lock-in”; that phrase implies there’s no way for me to get my data out if I’m using their system. It’d be incredibly easy to get the data out, and their query language is close enough to what I already use — the Django ORM — that I’m not hurting if I start an app on Google’s service and then migrate away.
Trivial? This is a web application platform that can’t integrate with the overwhelming majority of useful software designed for use by web applications. That needed to be fixed before it ever went into a public preview, not in a nebulous “next release”.
April 10, 2008
#
“The overwhelming strength of Python is its libraries.”
Yikes; that’s quite a claim there — that it’s “the strength”, not “a strength”, and overwhelmingly so, at that.
I don’t think it’s the “better libraries” that won people over from Perl and Java.
April 10, 2008
#
Response from Guido on AppEngine urllib/urllib2 feature request:
April 10, 2008
#
@Joe: For web application development, yup, you bet your ass that library support is the killer feature for Python. No other language I’ve used (and I’ve used Perl professionally, and dabbled with Java) has the same wealth of useful, high-quality and above all well-documented web-oriented libraries. Nothing else compares, period.
April 11, 2008
#
@David T, please understand what you are commenting on before commenting, you CAN overwrite modules in python, problem is that the module doesn’t exists, as in the code was deleted and the python that is running doesn’t has it.