In yesterday’s article I spent a fair amount of time talking about the word “magic”, specifically in the context of Clarke’s Third Law, which states that
Any sufficiently advanced technology is indistinguishable from magic.
A big part of what I was getting at was that a lot of things which seem to be explicable only by appealing to “magic” are really just cases of technology — sometimes extremely simple technology — being used in a complex way. Or, to borrow an excellent turn of phrase from Terry Pratchett, “ninety percent of most magic merely consists of knowing one extra fact.” In the case of the “magic” which used to be (up until the 0.95 release, the first after the “magic removal” effort) in Django, the apparent “magic” was that you’d define a model class and it would mysteriously disappear, to re-surface as an entire module worth of code somewhere else.
To return to an example from yesterday, if you defined an
Entry model inside an application named
blog, that class would “magically” end up inside a module called
django.models.blog.entries, and that
entries module would “magically” sprout methods for working with the
Entry class and a set of module constants and exceptions to go along with it. Now, that was a bit much to expect people to put up with — a class which should be at
blog.models.Entry suddenly showing up inside a module that doesn’t seem to exist anywhere on the filesystem is simply a recipe for confusion — and so it was removed in favor of the more intuitive system we have now, where modules and classes mostly stay where you left them.
But the “magic” here simply consists of knowing one extra fact: that Python lets you dynamically construct modules at runtime and shove them pretty much anywhere you like. This isn’t a feature of Python which gets used all that often — and, in fact, shouldn’t be used all that often because of the potential for confusion — but understanding the extra fact which makes the “magic” work is useful knowledge, and provides a nice demonstration of the fact that Python is generally a much more dynamic language than people give it credit for.
So today we’re going to make some magic.
The technique for dynamically constructing modules and making
import work on them as expected is actually pretty simple; like so many other things which are occasionally explained by “magic”, the things Django used to do to set up model modules were merely complex applications of simple principles, and were complex largely because of the sheer number of database API methods, model-specific classes and other machinery which needed to be built up.
So let’s work on a simpler example which illustrates the underlying principle. Our goal is to be able to do the following in a Python interpreter:
>>> import hello >>> hello.say_hello() Hello
At first glance this is fairly straightforward: we import a module named
hello, call a function in it called
say_hello() and it prints the word “Hello”. Nothing special about that, right? All you’d need to do is create a file called
hello.py, stick the function in it and put the file somewhere on your Python import path.
Except we’re going to make this work without creating any files, and especially without ever creating a file named
hello module is going to be created dynamically, the
say_hello() function is going to be put into it dynamically, the resulting module is going to be made importable dynamically and the whole thing is only going to exist in memory during a single interpreter session. And when we’re done you’ll have a pretty good idea of how Django used to set up the “magic” model modules, even if that was a slightly more complex use of the same principles.
The first thing we need to understand to do this is how Python’s import mechanism works; in other words, the exact steps Python goes through when it encounters an
import statement. If you want all the gory details, the official Python documentation on importing isn’t too bad an explanation of this process, and Fredrik Lundh has an excellent write-up of all the nooks and crannies of Python’s importing mechanism. I highly recommend giving both of those a thorough read at some point, but for now let’s walk through the key points together.
When you have a statement like
import hello in a Python program, Python goes through two steps to actually import it:
- Locate and, if necessary, initialize the module.
Bind the resulting
moduleobject to a name in your current scope.
So when Python sees
import hello, it wants to locate and possibly initialize a module named
hello, then assign the resulting
module object to the name
hello in your program’s current scope. If the
import statement ocurs at the top of a file,
hello will become a module-global name, for example.
The first step — locating and initializing the module — can happen in either of a couple of ways:
- If the module hasn’t already been initialized, that needs to happen. For most Python modules, that simply consists of executing the code in the module so that, for example, any classes or functions it contains get defined.
If the module has already been initialized, there will be a
moduleobject already in memory for it, and Python can simply grab that object.
Python figures out whether the module has already been initialized by looking at a dictionary named
modules which lives inside the built-in module “sys” of the Python standard library; sys.modules has keys corresponding to the import paths of modules which have already been loaded and initialized, and the values are the resulting
So the actual mechanism is pretty simple: when we say
import hello in a Python program, Python goes and looks for the key “hello” in the
sys.modules dictionary. If that key exists, Python gets the already-initialized
module object out of
sys.modules[‘hello’], and if it doesn’t then Python goes out to your file system and starts looking through the directories on your Python import path for a file or module named
hello, which will — if found — be initialized and create an entry in
sys.modules. If it isn’t found, Python will raise an
One important thing to note here is that if you have a module which can conceivably be imported in multiple different ways — say, because both your project directory and application are directly on your Python path, so that both
from myproject.blog.models import Entry and
from blog.models import Entry will work — you can end up with a single module getting initialized more than once, and having more than one entry in
sys.modules (one for each different way you’ve imported it). Significant sections of Django’s model-loading code exist to work around this and ensure that a given model class only gets initialized once.
Also, note that for module names which contain dots (e.g.,
import foo.bar), the mechanism is slightly different: Python looks for an entry in
sys.modules which matches up to, but not including, the right-most dot, then looks inside the resulting
module object for the final part. So in the statement
import foo.bar.baz, Python looks for the entry “foo.bar” in
sys.modules, then looks for something named
baz inside the resulting
By now you might be wondering whether, since
sys.modules is a dictionary, you can just go stick things into it. The answer is that you can: you’re free to do anything to
sys.modules that’s legal to do to a Python dictionary, though it’s almost always a bad idea to go messing around with it. But this points the way to how we’re going to make our eventual
import statement work: once we’ve constructed the
hello module, we can simply stick it into
sys.modules and Python will happily let us import it without ever bothering to check if an actual module of that name exists on the file system.
If you’ve ever tried to access, say, a nonexistent Django setting, you’ve probably seen an error like this:
AttributeError: 'module' object has no attribute 'some_random_name'
And so far I’ve been using the phrase “
module object” to refer to Python modules. Both of these give us a clue about how we can build a module on the fly: modules, like everything else in Python, are simply objects, and you can instantiate new
module objects just as you can instantiate objects from classes you’ve defined in your applications, assuming you know where to look.
The place to look is another module from Python’s standard library:
types, which contains the type objects for many of Python’s built-in types. If you know your way around the
types module, you can dynamically build nearly any sort of standard Python object on the fly, even some objects that you can’t normally construct otherwise. In this case the one we’re interested in is
types.ModuleType, which we can use to create a brand-new
module object at runtime; it works the same as instantiating any other object, and requires at least one argument: the name of the
module object to create. You can also optionally pass a second argument which will become the new module’s docstring, so that Python’s built-in
help() function will be able to show documentation for it (and other automated documentation parsers will be able to extract its documentation), but we’ll leave that off for this example.
So let’s go ahead and start building our
hello module. Pop open a Python interpreter and type in the following:
>>> import types >>> hello_mod = types.ModuleType('hello')
We now have a
module object bound to the variable
hello_mod; you can check that it really is a module and has the correct name — “hello” — in the interpreter:
>>> hello_mod <module 'hello' (built-in)>
At this point, the new module is simply a blank slate; we can stick anything into it that we like. So let’s define the
say_hello() function we’re going to use:
>>> def say_hello(): ... print "Hello" ...
And then add it to our
>>> hello_mod.say_hello = say_hello
You can call this function and verify that it works:
>>> hello_mod.say_hello() Hello
Putting it all together
Of course, we still need to make our new module importable via the name “hello”, but armed with an understanding of
sys.modules this is easy:
>>> import sys >>> sys.modules.setdefault('hello', hello_mod) <module 'hello' (built-in)>
setdefault() here instead of assigning directly to
sys.modules[‘hello’], because if there’s already a loaded module named
hello we shouldn’t overwrite it; the
setdefault() method of a Python dictionary takes a key and and a value, and then either inserts the value into the dictionary with the given key, if that key wasn’t already in use in the dictonary, or else does nothing to the dictionary. In either case it returns the value which ends up in the dictionary, which provides an easy way to figure out if you added a new value or not. In this case, the return value of
setdefault() was the module we just created, so we know it was added to
And now we can use the “magic”:
>>> import hello >>> hello.say_hello() Hello
This works because our dynamically-created
module object is now in
sys.modules under the name “hello”; since it’s in there, Python will simply return it any time we say
import hello and never bother checking the file system to see if a “real” module of that name exists on the Python path. We can even use the alternate
from syntax to import just the
>>> from hello import say_hello >>> say_hello() Hello
Once again, Python is simply giving us back the contents of the
module object we created in memory; the fact that it exists in
sys.modules again bypasses any need to check the file system. And as soon as you exit the Python interpreter, the
hello module will simply disappear; since it only ever existed in memory inside this single Python process, it will go away as soon as that Python process exits.
And now you know
At this point you can probably work out how Django — back in the 0.90 and 0.91 days — used to create the “magic” model modules which were importable from
django.models; there was a lot more work going on to build up the things which eventually lived inside that module, but ultimately it boiled down to the same two things we just did: createing a new
module object with
types.ModuleType, and making it importable by inserting it into
I can’t stress enough that this is something you probably shouldn’t ever do in real-world code, because — as the example of Django’s old-style model system shows — it’s confusing and counterintuitive to mysteriously create modules where people aren’t expecting them. And messing with
sys.modules, unless you really know what you’re doing, can also be dangerous; if you’re not careful you might accidentally delete or overwrite the entry for a module you were relying on, and then you’ll be in a real pickle.
But knowing how the process works — even if you never actually use it — helps to turn this from “magic” into a fairly straightforward application of a Python feature, and provides a useful glimpse into some of Python’s fundamental workings: knowing how Python’s
import mechanism works, for example, is incredibly important and handy for a broad range of Python programming tasks, and shows off a major part of Python’s sometimes-downplayed dynamic nature.