Understanding virtual environments in Python

December 5, 2023 Django, Python

This is part of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction.

Linking up

I want to talk today about Python virtual environments (or “venvs”), but first I need to cover a bit of background. Suppose you write a program, and it needs access to some other code, say in a library written by someone else, in order to run. How do you make that other code available?

One way would be to just insert a copy of the other code in your own program. This is generally known as static linking, and although the ins and outs of it can become complex, the basic concept is simple, and has been around for a long time. Several popular modern languages — including Go and Rust — are still exclusively or almost exclusively statically-linked.

But static linking can be wasteful, since if you have five programs which all need the same library, static linking generally ends up storing five copies of that library. It also requires recompiling all five any time there’s a need to upgrade the library they all depend on.

So an alternative is dynamic linking, where the necessary libraries are expected to be located somewhere — usually on the same machine, but not always — and can be loaded when the program runs. This lets you have a single copy of a library be shared by multiple programs (hence the common term “shared library”), and also simplifies upgrading: just upgrade the single shared copy, and every program will see the new version next time it runs.

But dynamic linking has its own downside: when a running program needs to load a library, it needs to know where to find that library. Historically, this was solved by picking a global system-wide directory to install shared libraries into, or perhaps a global directory system administrators could install into, and then one directory per user (often in their home directory) that ordinary non-administrators could install their own libraries into. Yet again, though, there are potential issues: what happens if you have different programs which want mutually-incompatible sets of shared libraries? This is a type of “dependency hell”, as it’s usually known (and has more specialized forms, like “DLL hell” for the way Windows traditionally did shared libraries).

Import-ant Python info

Python, being originally a Unix-y language of the 1990s, unsurprisingly has dynamic linking (the import statement) with a shared package/library directory. You can configure this through mechanisms like the PYTHONPATH environment variable, or adjusting sys.path at runtime and list more or fewer directories to search in, but the result is still that all Python programs run by a given user will see the same set of installed libraries and packages. Which can of course lead to “dependency hell” situations.

Other languages work around this in various ways; they can produce bundled archives of an entire application/project and all its dependencies (for example, Java), or use versioned installations of libraries (common with shared C libraries on various Unix-y systems, though the exact conventions vary from system to system), or provide other ways to have multiple copies of a library available and resolve the correct one at runtime (the node_modules directory common in JavaScript applications does this).

But Python, for a long time, did not have a standard solution for this. Admittedly, most of that was also a period when distributing and installing Python packages was itself still a bit of a mess — the earliest Python packaging tooling didn’t have things like dependency manifests or the ability to download and install packages from an archive like PyPI — so there may not have been so many people in need of it yet.

(yes, I’m sure some snarky person will claim it still is a mess, but honestly pip works really well and has for years, and packaging up Python code has been easy for just as long; most of the trouble people see these days remains in packaging up complex multi-language compiled extensions, which probably always will have a higher base difficulty level)

Introducing the virtual environment

The solution arrived in 2007, when the first public release of the virtualenv package was published to the Python Package Index. The idea behind virtualenv was to isolate different Python applications or projects, running on the same system, from each other, giving each one its own unique set of installed packages.

This proved to be a popular enough idea that, sixteen years later, the virtualenv project is still going strong, and a subset of its functionality has been adopted into Python’s standard library, as the venv module.

The implementation is actually surprisingly simple. A Python installation consists of a Python interpreter and a set of libraries to use with it; the default library location is platform-specific but can be overridden. A virtual environment, at its core, consists of:

A directory named bin/ containing a Python interpreter (usually, a shortcut or symlink to the original “parent” interpreter used to create the environment) and any other executables or other command-line entry points, and
A directory named lib/ containing a subdirectory named for the Python version, and a subdirectory of that named site-packages which will contain the actual installed libraries. On Python 3.11, for example, this will be lib/python3.11/site-packages.
A configuration file which, when detected in a specific location relative to the interpreter, will tell Python to override its base prefixes and use the virtual environment’s directories.

The other thing you get standard in a virtual environment is a set of “activation” scripts for various command-line shells; running the appropriate one for your shell will “activate” the virtual environment, but this just consists of setting the environment variable VIRTUAL_ENV as a hint to tooling, modifying your PATH (or equivalent) environment variable to find the virtual environment’s bin/ directory ahead of any others, and adjusting your command prompt to include the name or path of the virtual environment. “Deactivating” a virtual environment (usually by typing deactivate) undoes those environment-variable changes.

This is easy to try out: you can run python -m venv <path> to create a new virtual environment at <path>. Then you can go inspect it to see what it contains, try activating it and installing things in it, and then deactivate and delete it.

But I was told this was complicated!

Virtual environments themselves are not particularly complicated, but you can make them complicated, if you try.

For example, you can create lots of them and then manually mess with your path-related environment variables, which can cause not-at-all-fun confusion about what will run when you type python (or an executable Python script name like pip), or where you’ll be importing from if you run a Python program containing import statements.

Always explicitly activating and deactivating virtual environments, and paying attention to the command-prompt hint showing the active environment, go a long way toward preventing this sort of trouble. If you want, you can also get developer-workflow tools which will automatically create, manage, and activate/deactivate virtual environments for you as needed.

Another good habit to get into is invoking Python tools, when possible, via python -m . This takes a module name as its final argument, and runs that module. Many standard-library and popular third-party tools support this (so python -m venv as above, python -m pip to run pip, etc.). If a virtual environment is active and you haven’t messed with your PATH after activating it, python will always find that virtual environment’s Python interpreter (and the same is true for versioned interpreter names; for example, in a Python 3.11 virtual environment, python, python3, and python3.11 will all find the virtual environment’s interpreter).

Finally, when building a Docker container to run some Python code (such as a web application built with Django or another framework), you should almost always create a virtual environment, even if you don’t think you’ll need it. This ensures that even if another Python interpreter winds up inside the container (say, due to installing a system package with a dependency on the distro’s Python package/interpreter, which is easy to do without realizing), you won’t mess with it or its packages. And several Linux distributions now are in the process of adopting a mechanism to require you to use a virtual environment in order to install packages with pip (for example, Debian 12 and later will show an error telling you to use apt, not pip, to install packages for the default “system” Python interpreter, and to create a virtual environment if you want to use pip).