ORM Wars

An entry published by James Bennett on September 4, 2007, Part of the categories Django and Frameworks. 13 comments posted.

Last week while I was still on blog hiatus, Adam Gomaa wrote up a pretty constructive summary of why he prefers SQLAlchemy over the default Django ORM, and documented how he made SQLAlchemy a little less painful to use by writing a set of helper functions before moving on to announcing that he’s writing his own declarative layer — borrowing somewhat from Django’s model syntax — on top of SQLAlchemy.

I went back and read a few of Adam’s other posts, and generally I like what I see; he manages to combine several important tendencies in a blogger:

  1. He’s honest.
  2. He manages honesty without being personally insulting.
  3. He doesn’t spare anybody, not even SQLAlchemy.

His big issue right now seems to be the state of ORM in the Python world: he likes Django’s model syntax but doesn’t like the limitations of our ORM, and likes SQLAlchemy’s power and flexibility but doesn’t seem to care for its syntax at all. It’s a quandary, and his solution, apparently, is to write a new declarative layer on top of SQLAlchemy.

I’ve got some ideas about that, and some ideas about where Django might be able to go, but unfortunately his comment form is a bit too small to contain it all, so I’ll ramble about it here where I’ve got essentially unlimited space :)

Framing the problem

The first thing to understand about ORMs is that they exist along at least one continuum. I’ve written before about some of the problems inherent in ORM (though in the context of implementing subclassing of models), and the underlying impedance mismatch is the source of this: when you write an ORM you’re bridging the gap between the relational model of your database — expressed through SQL — and the object-oriented model of your programming language. ORMs, then, have to include bits of both worlds, and in doing so have to make a decision about which bits to include and, more importantly, how much of each “side”. In practice, this means that most ORMs tend to lean more toward one side of the continuum than the other: some are decidedly more “SQL-ish” while others are more “OOP-y”. If you prefer design-pattern terminology, this means that some follow Data Mapper fairly closely, while others follow Active Record. SQLAlchemy is one of the former, Django’s ORM is one of the latter.

The problem of designing an ideal ORM, then (or one that’s as close to ideal as practicable), is to pull in as much as possible from both sides: provide the power and flexibility that can only come from being close to SQL, while simultaneously providing the (relatively) clear and intuitive syntax of object-oriented programming languages. Adam’s proposed solution is to take SQLAlchemy — which is about as “close to the metal” as you’re going to get in a Python ORM — and build a declarative layer on top of it with clear OOP syntax. It would, in effect, be the best of both worlds.

The question, then is: can it be done? I’m not entirely certain. But Elixir has made me quite a bit less pessimistic about it. And, more importantly, do we have to completely write off Django’s ORM and assume that SQLAlchemy, or something very similar to it, is the only way forward?

It does (or could do) more than you think

Personally I’m relatively optimistic about the ability to make the backend of Django’s ORM more powerful, and more capable of expressing complex and advanced SQL, than it currently is; I know there are some folks who think it’s not possible to take an Active Record-style ORM and build out its backend to have even a significant portion of the power of a Data Mapper style, but I’m not entirely convinced. To understand why, look back at the history of Django’s ORM.

Our ORM was originally designed and developed at a time when there weren’t multiple high-quality ORMs available in Python, and its syntax has evolved amazingly in the two years it’s been public (anybody who remembers having to declare fields explicitly, for example, will understand just how far it’s come). The declarative style of model class we have now was a huge leap forward, and it appeared pretty quickly after Django went public; in fact, it appeared so quickly that it was in by the time of the first packaged release.

Between the 0.91 and 0.95 releases the ORM was completely refactored, and now exists largely in two parts:

  1. The model syntax, which is responsible for providing a description of the database table in use, and
  2. The query system, which is responsible for actually getting data to and from the database.

The interesting thing is that the query system largely lives in a completely different set of code: except for actions directly related to a single instance of a model class (save(), delete(), next/previous helpers and fetching across relations to a specific object), you interact with the database through instances of two other classes: django.db.models.Manager and django.db.models.query.QuerySet. And most of the database-interacting bits which exist directly on a model instance actually call out into the query system: relations are handled by descriptors backed by various types of Manager instances, and the helpful next/previous methods are simply calls to the Manager (and Manager itself is — by default — largely a convenient wrapper around various ways to get and manipulate instances of QuerySet).

This doesn’t mean that Django is a Data Mapper-style ORM, or that it can do all of the things a Data Mapper ORM can do, but it does mean that — because the system which generates SQL is largely separate from the syntax that defines the model classes — there’s a lot of room for exploring interesting Data-Mapper-ish things in the Django ORM. And it’ll get quite a bit easier to do that in the not-too-distant future, thanks to two things happening right now:

  1. The move to the newforms library.
  2. A major refactoring of the query system.

The move to newforms is important because it’s being used as an opportunity to — as much as possible — clean up the Django admin and clear out some of the places where other components were coupled to it. It will still rely heavily on Django’s components, but some of the less desirable invasions of the admin into other parts of Django will be going away. That means no more inner Admin classes in models; instead we’ll have a relatively self-contained system for generating the admin interface, which means there’s no longer a need for crufty special-case admin support in the ORM. That’s big.

The query refactor is even bigger, because it’ll significantly clean up and organize the code which actually interacts with the database, opening the way to hook in all sorts of interesting SQL features which really haven’t been supportable in the Django ORM up to this point.

Let’s make a trade

I like the Django ORM for the fact that it makes the easy things easy. And I like it because a lot of hard things are at least possible, if not terribly pretty; every ORM that isn’t just a light wrapper around SQL needs to offer ways to “escape” and start plugging in bits of arbitrary SQL or even entire queries, and Django does that right now, which means you can sneak a lot of things into it if you’re willing to put in the work. I’d like for most or all of those things to get easier, and for some other things to move into the realm of the possible. It’ll never be as powerful as SQLAlchemy, but I think a lot of people are too quick to simply take a snapshot of it out of context, declare it “feeble” and write it off permanently.

I like what I’ve seen of the power and flexibility of SQLAlchemy: pretty much anything you can do with pure SQL you can do with SQLAlchemy, but at the cost of horrific syntax and a 400-page user’s manual. Declarative layers on top of it are a big help, but so far none of them has really hit the same sweet spot of syntax that Django has. I’d like to see that situation improve, because it’s always going to be able to do things that a more “OOP-y” ORM will never quite be able to manage and because I wouldn’t wish its syntax on my worst enemy, but I don’t think it’ll ever be quite as nice as the syntax that’s possible if you accept a few limitations, and I think a lot of people are too quick to assume that an unbelievable amount of complexity can simply be hidden away behind a suitable interface.

So I’d like to see more cross-pollination of ideas and less dismissal on both sides: generally (from the discussions I’ve read and participated in), the thing holding back more comprehensive support for advanced SQL features in Django’s ORM has been the need to keep the syntax as clean and friendly as possible, and — going on what I’ve heard from some of the SQLAlchemy users I know — the biggest thing holding back cleaner syntax in declarative layers on top of SQLAlchemy is the need to keep as much raw power in the system as possible. These desires are symptomatic of where the two systems are perched on the ORM continuum, and I think it’s a good thing to have competing ORMs in different places on that continuum; designing an ORM is essentially a series of tradeoffs, and I like the idea of multiple systems choosing to make different trades. To paraphrase something I once said about Python web frameworks (and which was in turn a play off a comment by Malcolm Gladwell about Pepsi products): there is not and never will be a perfect Python ORM. There will only be different Python ORMs which are perfect for different people.

Django’s ORM is probably always going to make different tradeoffs, and target a different set of users, than SQLAlchemy, and vice-versa. That’s not a bad thing. That doesn’t mean that Django’s ORM can’t do or learn how to do a lot of extremely interesting and powerful things. And it doesn’t — to me, at least — mean that anyone who wants to improve the state of Python ORM has to write it off and go build layers on top of SQLAlchemy.

There’s one other thing I’ve liked in what I’ve read so far of Adam’s blog entries: I find myself agreeing with him an awful lot (hooray echo chamber!). Not all the time (I’m perfectly happy with dynamic keyword arguments for lookups, for example), but definitely a significant percentage of the time. I’m just not as convinced as he is that Django’s ORM is a dead end, and I wish he and some more folks like him would take a closer look before they make a decision as to where their development efforts can be used :)

On September 5, 2007, Rick Lawson said:

Very interesting read. Coming from a Java background there is a similar tug of war between SQLMaps and Hibernate. I definitely say err on the side of simplicity, use your ORM for when it saves you time for the simple stuff. Drop back to sql for the hard stuff which no ORM will ever be able to get right. i think the Django ORM hits pretty close to the sweet spot. Let’s face it, if you can’t write sql you don’t need to be in this business anyway.

On September 5, 2007, jpd said:

Yep, Django ORM definitely hits the sweet spot for me. Most comfortable syntax that I’ve seen in an ORM yet. With the recent work alot of that gnarly coupling is going away, so it’s looking pretty neat and clean(ish) under the hood now.

I am however frustrated by the lack of movement on the model inheritance front. I know these are difficult problems, but even seeing the proposed “models.AbstractModel” type of inheritance implemented would make my life a lot easier, and my models a lot less tortured. I am really hopeful that with the query refactor wrapping up, we’ll see something along those lines land in trunk soonish.

On September 5, 2007, Jason said:

You are henceforth banned from using the term ‘close to the metal’ ever again, due to grievous abuse of the term. :) How far removed from ‘the metal’ are we when we use the SQLAlchemy ORM? It is an abstraction layer for SQLAlchemy’s DB toolkit, which is an abstraction layer for SQL, which is an abstraction layer for querying a database, which itself is an abstraction layer for storing data.

Anyways, more to the point of the article: I don’t understand the complaints about SQLAlchemy’s syntax. You set up the mapping once, and then more or less forget that it exists, and just work with your objects.

Also, the only ‘advantage’ of the ActiveRecord pattern is that you don’t have to think about your DB schema, but instead can focus on just the ORM’s modeling concepts. This isn’t really an advantage at all, though - it requires the same amount of knowledge or more, but just shifts it away from a universal standard to whatever the author of the library thought would be easier for their use cases. If you’re going to invest in learning a concept, invest in the concept with the bigger payoff. SQL will be relevant as long as relational DBs are, and there will be fewer walls to climb down the road, when you need to do something the AR-ORM author didn’t anticipate.

On September 5, 2007, Henrik said:

The only thing I really miss in the Django ORM on the modelling side is, * Proper handling of primary keys. No autoincrementing numbers are not acceptable requirements. Multi-field primary keys are supported in SQL for a reason.

So far Hibernate is the only ORM I have seen that I don’t find myself having to code around to make database handling efficient.

Hopefully the Django ORM will gradually gain a decent richness. It certainly is a nice startingpoint semanticly.

On September 5, 2007, Amirouche B. said:

mistyeped “types”, in the text.

I don’t understand something. If it’s possible to keep the Django model syntax, and still be able to use the power of SQLalchymy, why not use it ? Because soon or later you will be in the need of using advanced SQL and prefer to use SQLalchimy goodness. I embrace the TurboGears 2 strategy “standing on the shoulders of giants”, getting the best from the bazaar/network devellopement power, but I still prefer Django for it’s usability even if you have to digg sometimes (http://gulopine.gamemusic.org/2007/09/django-and-linux-desktop.html). SQLa is made for web developer but also for web framework developpers, the different interfaces (Elxyr…) are the proof, and I think that’s the best way to get most from it. Of course you may want not to introduce any dependencies in the framework but beeing able to easly to switch some parts is good.

On September 5, 2007, James Bennett said:

Jason: outside of using the raw DB API, SQLAlchemy is about as close to the pure SQL as you’re going to get in Python. Hence the analogy holds. The problem with the syntax, to me, is that there’s so much of it, and it’s so prevalent; with other ORMs a simple use case is simple and doesn’t have to involve the more advanced features, but with SQLAlchemy the full complexity of the system is laid out in front of you no matter what you’re doing. In other words, the way SQLAlchemy is laid out makes the “easy things” (common cases) harder than they need to be.

Amirouche: you’ll need to back up what you’re saying; all you’ve asserted, basically, is “SQLAlchemy is better because SQLAlchemy is better”, and that’s the fallacy of begging the question. Provide an explanation of why you don’t think the Django ORM could be made suitably powerful without having to be replaced by SQLAlchemy.

On September 5, 2007, Cam said:

I think Django ORM is definitely getting closer to where it needs to be. While I was initially excited by the SQLAlchemy branch, over complexity is quite a turn off, even if it’s hidden under Django’s cute declarative syntax. That aside, I reckon model inheritance will be absolutely critical going forward.

On September 5, 2007, Jason said:

Close to the metal’ isn’t a good analogy here, because it’s already an analogy in common use that means something makes use of features specific to the hardware it’s being run on. Anyways, I mostly meant it as a joke.

As for SQLAlchemy making easy things harder than they need to be: I dunno, easy things are pretty easy in SQLAlchemy. It may require a bit more code to get up and going, but that’s not as much difficult as it is tedious, and that overhead more than pays for itself pretty quickly in any kind of real system.

On September 6, 2007, Fuzztrek said:

I really appreciate Django’s ORM. There are a few things that I feel are missing from it, however I hope that they will be fixed or implemented (or at least made possible) in the much-anticipated query.py refactor. Specifically, I’ve run into inconsistencies with related fields (e.g. trying to order one model by a field in a foreign key’s table) and conflicts with certain queries (particularly .select_related()) and .extra().

I can only comment on my experiences with the active record pattern, having never used SQLAlchemy or anything like it. In Django, what impresses me about the ORM is how easy it is to get the results from a database in a “useful” form. This depends on the type of query, of course, but in general model objects are useful for me, and I think for most web developers if you’re following a pattern that includes “models” (such as you are with Django). I think writing SQL should always be an option, and I agree that it should not be the aim of the Django ORM to replace this. What I would find really, really beneficial, though, is some way of taking the results from custom SQL and getting a QuerySet, or even a list, of models out of them. I’m not aware of any way to do this currently, and I’m sure it’s no small task. This is pretty much the only thing that deters me from writing SQL by hand. If possible, I’ll gladly include a subquery in .extra(), but that doesn’t work for all problems.

I know a lot of people are after aggregation support, and I hear that’s coming too. I’m not sold on it yet, but just because its in doesn’t mean I have to use it. For my uses (statistics and the like) aggregates by way of custom SQL suffice and actually make more sense (at least at the moment—perhaps if there were an aggregate model field my opinion would change…)

Anyway, once again I must congratulate Django developers and everyone who contributes to the project. I think things are on the right track, and I’m eagerly awaiting all of the good things to come this month!

On September 7, 2007, akaihola said:

Fuzztrek longed for “some way of taking the results from custom SQL and getting a QuerySet, or even a list, of models out of them.”

Google for “django djselect” and see if that provides what you’re looking for.

I had to extend Joel’s work for my purposes to handle one-to-many cases better, but it’s already very useful without that, too.

On September 7, 2007, Amirouche B. said:

Provide an explanation of why you don’t think the Django ORM could be made suitably powerful without having to be replaced by SQLAlchemy. Cam”

I don’t say that Django ORM can not be made powerful, this is wrong, as little as my experience can say. What I want to say is “why reinventing the wheel”, and “compare django strategy to turbogears 2.0 one”, which try to get the “best” bits from the python world and build a web framework. TB2.0 will work with clearly defined different library which targets one issue (ORM, template…), which gets more inputs, dev forces because can be used outside of the little kingdom of one project. It looks like that what Django is trying to do, is harder. So far, it’s the one that suits best my needs, so there’s little room for critism. To my mind, Adam’s work is worth.

On September 7, 2007, James Bennett said:

Well, if you want to ask why people reinvent the wheel, why did people keep writing new Python frameworks once Django was already public? You’ll notice we’ve never gone to Kevin or Mark or Ben Bangert (of Pylons) and said, “why don’t you just incorporate Django’s components” ;)

On September 8, 2007, Amirouche B. said:

I hope I’m not too harsh or my word are not misunderstood (bad english). “why don’t you just incorporate Django’s components” That’s why Pylon people (I think) made clear separate component for each “library”: routes, webhelpers. so that people can easly incorporate it in their own project. “why did people keep writing new Python frameworks once Django was already public?” Because django is not flagged as RoR clone :). I’m not aware of big issues involved in framework building, but I just noticed that Django among Pylon and TG2 is the one that use only its own component.

I don’t want to make a war, I’m just curious. If Adam finish his work, we may have an example of what I want to say. Since then, I stop the battle here ;)

Comments for this entry are closed. If you'd like to share your thoughts on this entry with me, please contact me directly.