Compile your Python
This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction.
You can compile Python?
Yes. And in a lot of ways!
But Python itself — at least, the default “CPython” implementation you’d get when downloading from
python.org — is also a compiled language. Specifically, it compiles from Python source code to Python bytecode, and then the Python interpreter provides (among other things) a simple virtual machine which executes that bytecode. This is similar conceptually to how several other languages work. For example, Java compiles to Java bytecode which is run on a Java Virtual Machine, and C# compiles to bytecode for the .NET Common Language Runtime.
The most visible difference is that in those other languages, compilation is almost always an explicit step: you invoke a compiler, or press a button in your editor or IDE which will invoke a compiler, to explicitly build your project. But in Python, compilation almost always happens implicitly.
Whenever you run a Python program, that program (and anything it imports) will be compiled from Python source code to Python bytecode, and then the bytecode is what’s executed. This all happens automatically for you, and you might not even notice it happening. But the evidence is there, if you look for it, because Python leaves the compiled bytecode behind in files, which can be reused on future runs to avoid the overhead of recompiling them each time.
The way this works is that any time a module is imported, a new directory will be created alongside that module, named
__pycache__/, and the bytecode files will be placed in that directory. Each one will be named for the original file it was compiled from, and the version of Python which compiled it (since bytecode can change from one Python version to the next).
For example, suppose you have a directory
my-modules/ on your import path, and in it is a module
my_lib.py, and you’re running Python 3.11. When Python encounters an
import my_lib statement, it will load that file, bytecode-compile it, and store the result in
Bytecode files aren’t particularly interesting to look at, but you can use the
dis module in the standard library to inspect bytecode in a more human-friendly format (and the documentation for it also contains brief explanations of the bytecode instructions; the CPython interpreter is a simple stack-based virtual machine, so many instructions are about pushing things onto or manipulating the top of the stack).
If you want to understand more about reading Python bytecode, I gave a talk at PyCon 2018 about it, and there are some references at the end (and here’s a copy of the slides with clickable links — it also looks a bit nicer than in the video, because the computer I had to present from at that PyCon didn’t have the right fonts or screen resolution).
But today I want to tell you about options for writing bytecode.
Go ahead, have a byte
You can run an explicit compilation step on a Python application, to compile the source code to bytecode. It’s just relatively rare to do so, though I’m going to make an argument that it should become less rare.
There are two main reasons why you would want to do this:
- Compiling your bytecode up-front can offer a small improvement to application startup, since Python will not need to compile modules as it imports them.
- Compiling your bytecode up-front lets you turn off bytecode compilation, and other writes to the directories holding your code, afterward.
The first of these is probably only a small benefit, and may not be worth it for many use cases. The second, though, can be extremely useful. For example, I write web applications for a living, and these days they’re basically all deployed as Docker containers. Making the directory that holds your code be non-writeable at runtime (and ideally also running as a non-root user with as few permissions as possible) is a nice improvement to your container security; of course, it’s very far from being the only thing you should do for container security, and won’t prevent every issue you might encounter, but it at least raises the difficulty level of some attacks if it’s not possible to write code into the location(s) your app will import from.
So all of my Dockerfiles include explicit compilation of the code during the container build, and then stop further writes to those directories.
Explicitly compiling Python source code can be done with the
compileall module from the standard library; run
python -m compileall and pass a list of files or directories to compile. I also recommend using the
-q flag, since by default
compileall will output each individual file name as it compiles, which can overwhelm some CI and build-tool log viewers with a large enough set of files.
You can also set the environment variable
PYTHONDONTWRITEBYTECODE or pass the
-B flag to your Python interpreter to prevent any bytecode compiling at runtime.
So if the application code is being placed in the directory
/app, for example, you might put something like this in your Dockerfile:
RUN python -m compileall -q /app
You could then also run the container as a user which doesn’t have write permissions to
/app, and get a nice little bit of security hardening.