ORM Wars

Published: September 4, 2007. Filed under: Django, Frameworks.

Last week while I was still on blog hiatus, Adam Gomaa wrote up a pretty constructive summary of why he prefers SQLAlchemy over the default Django ORM, and documented how he made SQLAlchemy a little less painful to use by writing a set of helper functions before moving on to announcing that he’s writing his own declarative layer — borrowing somewhat from Django’s model syntax — on top of SQLAlchemy.

I went back and read a few of Adam’s other posts, and generally I like what I see; he manages to combine several important tendencies in a blogger:

  1. He’s honest.
  2. He manages honesty without being personally insulting.
  3. He doesn’t spare anybody, not even SQLAlchemy.

His big issue right now seems to be the state of ORM in the Python world: he likes Django’s model syntax but doesn’t like the limitations of our ORM, and likes SQLAlchemy’s power and flexibility but doesn’t seem to care for its syntax at all. It’s a quandary, and his solution, apparently, is to write a new declarative layer on top of SQLAlchemy.

I’ve got some ideas about that, and some ideas about where Django might be able to go, but unfortunately his comment form is a bit too small to contain it all, so I’ll ramble about it here where I’ve got essentially unlimited space :)

Framing the problem

The first thing to understand about ORMs is that they exist along at least one continuum. I’ve written before about some of the problems inherent in ORM (though in the context of implementing subclassing of models), and the underlying impedance mismatch is the source of this: when you write an ORM you’re bridging the gap between the relational model of your database — expressed through SQL — and the object-oriented model of your programming language. ORMs, then, have to include bits of both worlds, and in doing so have to make a decision about which bits to include and, more importantly, how much of each “side”. In practice, this means that most ORMs tend to lean more toward one side of the continuum than the other: some are decidedly more “SQL-ish” while others are more “OOP-y”. If you prefer design-pattern terminology, this means that some follow Data Mapper fairly closely, while others follow Active Record. SQLAlchemy is one of the former, Django’s ORM is one of the latter.

The problem of designing an ideal ORM, then (or one that’s as close to ideal as practicable), is to pull in as much as possible from both sides: provide the power and flexibility that can only come from being close to SQL, while simultaneously providing the (relatively) clear and intuitive syntax of object-oriented programming languages. Adam’s proposed solution is to take SQLAlchemy — which is about as “close to the metal” as you’re going to get in a Python ORM — and build a declarative layer on top of it with clear OOP syntax. It would, in effect, be the best of both worlds.

The question, then is: can it be done? I’m not entirely certain. But Elixir has made me quite a bit less pessimistic about it. And, more importantly, do we have to completely write off Django’s ORM and assume that SQLAlchemy, or something very similar to it, is the only way forward?

It does (or could do) more than you think

Personally I’m relatively optimistic about the ability to make the backend of Django’s ORM more powerful, and more capable of expressing complex and advanced SQL, than it currently is; I know there are some folks who think it’s not possible to take an Active Record-style ORM and build out its backend to have even a significant portion of the power of a Data Mapper style, but I’m not entirely convinced. To understand why, look back at the history of Django’s ORM.

Our ORM was originally designed and developed at a time when there weren’t multiple high-quality ORMs available in Python, and its syntax has evolved amazingly in the two years it’s been public (anybody who remembers having to declare fields explicitly, for example, will understand just how far it’s come). The declarative style of model class we have now was a huge leap forward, and it appeared pretty quickly after Django went public; in fact, it appeared so quickly that it was in by the time of the first packaged release.

Between the 0.91 and 0.95 releases the ORM was completely refactored, and now exists largely in two parts:

  1. The model syntax, which is responsible for providing a description of the database table in use, and
  2. The query system, which is responsible for actually getting data to and from the database.

The interesting thing is that the query system largely lives in a completely different set of code: except for actions directly related to a single instance of a model class (save(), delete(), next/previous helpers and fetching across relations to a specific object), you interact with the database through instances of two other classes: django.db.models.Manager and django.db.models.query.QuerySet. And most of the database-interacting bits which exist directly on a model instance actually call out into the query system: relations are handled by descriptors backed by various types of Manager instances, and the helpful next/previous methods are simply calls to the Manager (and Manager itself is — by default — largely a convenient wrapper around various ways to get and manipulate instances of QuerySet).

This doesn’t mean that Django is a Data Mapper-style ORM, or that it can do all of the things a Data Mapper ORM can do, but it does mean that — because the system which generates SQL is largely separate from the syntax that defines the model classes — there’s a lot of room for exploring interesting Data-Mapper-ish things in the Django ORM. And it’ll get quite a bit easier to do that in the not-too-distant future, thanks to two things happening right now:

  1. The move to the newforms library.
  2. A major refactoring of the query system.

The move to newforms is important because it’s being used as an opportunity to — as much as possible — clean up the Django admin and clear out some of the places where other components were coupled to it. It will still rely heavily on Django’s components, but some of the less desirable invasions of the admin into other parts of Django will be going away. That means no more inner Admin classes in models; instead we’ll have a relatively self-contained system for generating the admin interface, which means there’s no longer a need for crufty special-case admin support in the ORM. That’s big.

The query refactor is even bigger, because it’ll significantly clean up and organize the code which actually interacts with the database, opening the way to hook in all sorts of interesting SQL features which really haven’t been supportable in the Django ORM up to this point.

Let’s make a trade

I like the Django ORM for the fact that it makes the easy things easy. And I like it because a lot of hard things are at least possible, if not terribly pretty; every ORM that isn’t just a light wrapper around SQL needs to offer ways to “escape” and start plugging in bits of arbitrary SQL or even entire queries, and Django does that right now, which means you can sneak a lot of things into it if you’re willing to put in the work. I’d like for most or all of those things to get easier, and for some other things to move into the realm of the possible. It’ll never be as powerful as SQLAlchemy, but I think a lot of people are too quick to simply take a snapshot of it out of context, declare it “feeble” and write it off permanently.

I like what I’ve seen of the power and flexibility of SQLAlchemy: pretty much anything you can do with pure SQL you can do with SQLAlchemy, but at the cost of horrific syntax and a 400-page user’s manual. Declarative layers on top of it are a big help, but so far none of them has really hit the same sweet spot of syntax that Django has. I’d like to see that situation improve, because it’s always going to be able to do things that a more “OOP-y” ORM will never quite be able to manage and because I wouldn’t wish its syntax on my worst enemy, but I don’t think it’ll ever be quite as nice as the syntax that’s possible if you accept a few limitations, and I think a lot of people are too quick to assume that an unbelievable amount of complexity can simply be hidden away behind a suitable interface.

So I’d like to see more cross-pollination of ideas and less dismissal on both sides: generally (from the discussions I’ve read and participated in), the thing holding back more comprehensive support for advanced SQL features in Django’s ORM has been the need to keep the syntax as clean and friendly as possible, and — going on what I’ve heard from some of the SQLAlchemy users I know — the biggest thing holding back cleaner syntax in declarative layers on top of SQLAlchemy is the need to keep as much raw power in the system as possible. These desires are symptomatic of where the two systems are perched on the ORM continuum, and I think it’s a good thing to have competing ORMs in different places on that continuum; designing an ORM is essentially a series of tradeoffs, and I like the idea of multiple systems choosing to make different trades. To paraphrase something I once said about Python web frameworks (and which was in turn a play off a comment by Malcolm Gladwell about Pepsi products): there is not and never will be a perfect Python ORM. There will only be different Python ORMs which are perfect for different people.

Django’s ORM is probably always going to make different tradeoffs, and target a different set of users, than SQLAlchemy, and vice-versa. That’s not a bad thing. That doesn’t mean that Django’s ORM can’t do or learn how to do a lot of extremely interesting and powerful things. And it doesn’t — to me, at least — mean that anyone who wants to improve the state of Python ORM has to write it off and go build layers on top of SQLAlchemy.

There’s one other thing I’ve liked in what I’ve read so far of Adam’s blog entries: I find myself agreeing with him an awful lot (hooray echo chamber!). Not all the time (I’m perfectly happy with dynamic keyword arguments for lookups, for example), but definitely a significant percentage of the time. I’m just not as convinced as he is that Django’s ORM is a dead end, and I wish he and some more folks like him would take a closer look before they make a decision as to where their development efforts can be used :)