How I’m testing in 2020

February 3, 2020 Django, Python

Once upon a time I wrote a bit about testing, specifically how I was organizing and testing my open-source Django apps. It’s been a while since that post, though, and the calendar has even flipped over to a new penultimate digit in the year number, so it’s worth revisiting to go over what’s changed in how I do things and what’s stayed the same. And since I do maintain a couple things that aren’t Django-related, I’d like to talk about how I test them, too.

But before I dive in, a reminder: this is a place where I publish my opinions. They’re based on my personal taste and they work for me. If something else works for you, stick with it, and if you prefer something else, that’s OK! Beyond basic stuff like “you should probably have some tests”, there aren’t really a lot of objectively right answers here.

And now with that disclaimer out of the way, here’s how I’m testing in 2020.

Writing tests: still unittest style

The unittest module began as a Python port of the JUnit test framework from Java (its name was even “PyUnit”), which in turn is part of the broader family of xUnit-style test frameworks. PyUnit joined the Python standard library, under the name unittest, way back in Python 2.1.

(Python 2.1 also added the doctest module, which allows tests to be written in the docstrings of functions and methods, but doctest is pretty limited in what it can do, and mostly is useful as a way of making sure example code in docstrings is up-to-date and works correctly, rather than as a comprehensive solution for testing a codebase)

The pytest framework (sometimes also known as “py.test”) is newer and, while it can discover and run unittest-style class-based tests, really wants you to write tests differently.

The aim of pytest — as I understand it — is to enforce, or at least strongly encourage, separation of three concerns:

The logic of the tests
The data the tests will operate on
The resources the tests will need

In the pytest approach, logic goes in the body of the test function; data will come from parametrization; and resources will come from fixtures.

I’ve worked on codebases that made extensive use of pytest, and while I respect the goal of separating these, I find the particular way pytest does it increases the cognitive load too much for me.

Some of this is my personal dislike for pytest’s signature-based approach to dependency injection (if you have a fixture named foo, and a test function which accepts an argument named foo, pytest ensures the fixture is passed as an argument when running the test). But I find that fixture definitions can be difficult to track down and keep in mind, and their behavior is dependent on scoping/lifetime rules that may not be visible at the site of the test definition, or even in the same file as the test.

For me, this increases the difficulty of understanding and reasoning about tests to a level that’s no longer worth the convenience the framework is providing. I also find the default verbosity of pytest’s output to be a bit overwhelming, especially when there are multiple test failures; I know it’s trying to be helpful, but there is such a thing as too much information (and ironically, the one place where I often do want verbose output — diffing expected versus actual data structures on either side of a failed assertion — is seemingly also the one place pytest chooses not to be verbose by default).

So even though unittest probably isn’t what you’d end up with if you set out to design a “Pythonic” test framework from scratch, and even though it is at times more verbose than the alternatives, I still prefer it over pytest. It’s just generally easier, at least for me, to read and understand what’s going on inside a TestCase class than to find and mentally keep track of all the places things might be coming from in tests written in fluent pytest style.

Note that I do still use pytest for discovering and running tests in my non-Django projects, since it’s quite good at that. But it’s always running tests written in the unittest style, and using the Python standard library’s unittest.mock for mocking, rather than pytest’s monkeypatch fixture.

Environment management: still tox

I generally adopt a policy of supporting each Django and Python version that also receives upstream support. At this exact moment I’m bending that policy a bit because of a misalignment of support cycles: Django 1.11, an LTS released nearly three years ago, was the last Django release to support Python 2, which reached its end of upstream support from the Python core team on January 1 (the actual final packaged release will happen later, but 2020-01-01 was the freeze date for that release; if something didn’t get in before that, it won’t be part of the final packaged 2.7 release). But Django 1.11 won’t reach the end of its support until April 2020.

So there’s an awkward period of a few months going on right now, where Django 1.11 still receives upstream support from the Django core team, but Python 2 does not receive support from the Python core team. And the most recent Django release — 3.0 — drops most of the legacy Python-2 support infrastructure that had been kept in Django throughout the 2.x release series to make it easier for apps to support Django 1.11 and Django 2.x at the same time.

I’ve decided to deal with this by not dealing with it: as I refresh my personal open-source apps for Django 3.0 support, I’m dropping Django 1.11 (and Python 2) support at the same time. The time remaining in Django 1.11’s support period is too short, to me, to justify expending any effort on maintaining compatibility in apps that also target Django 3.0.

For non-Django libraries I maintain, the situation is a lot simpler. I’m just supporting Python 3.5 and later, matching the versions receiving upstream support from the Python core team.

But this still produces a pretty complex matrix of supported versions: for a non-Django library it’s four Python versions. For a Django app, it’s two versions of Django on up to four versions of Python each.

To define and maintain my testing matrices I use tox, plus the tox-travis plugin which provides helpers for using tox with Travis CI.

I don’t know of any major competitors trying to fill the same niche as tox, and the sheer number of Python projects nowadays that have a tox.ini file at the root of their repository suggests it’s become, or is becoming, the standard solution for this.

For local test runs I do need to have multiple different versions of Python installed; I manage this using pyenv and pyenv-virtualenv, which make it easy to install a variety of different Python versions and create and manage virtualenvs associated with them.

Django: still using standalone configuration/runner

When you deploy a site or service built with Django, you have a settings file specifying your list of apps and other configuration, and access to the manage.py script with its variety of helpful commands. But that infrastructure isn’t generally available when building a single app to be distributed on its own, which then requires some other method of configuring Django and running the tests.

I do this with a single-file approach; I call the file runtests.py and include it in the root of the repository for my Django apps. That file does a few things:

Set up a dict containing the necessary settings, then import and call django.settings.configure() passing in those settings.
Import and call django.setup() to initialize Django.
Instantiate Django’s test runner, and call its run_tests() method.

If you want to do this yourself, the minimum settings to get Django up and running to test an app are INSTALLED_APPS, ROOT_URLCONF, and DATABASES. From there, add any further settings the app needs.

Here’s an example of what this looks like, from django-registration. Note: that file adds "tests" to INSTALLED_APPS, but for a reason very specific to django-registration, which includes a custom user model for testing purposes inside its test suite. Since there’s a model in there, it has to go in INSTALLED_APPS for Django to notice it, but most Django app test suites don’t involve any test-only models (instead they use the models in the app itself), and so don’t need to add their tests to INSTALLED_APPS.

Test entry point: changed

In ages past, I supported setup.py test as the way to actually invoke tests. The history of Python’s packaging tools is a bit complex, and one part of that complexity was a trend toward implementing all sorts of things as arguments to setup.py. One of those was test, which was meant to provide a standard way to run a package’s test suite. At one time this made sense — other languages’ packaging tools have also supported running the package’s test suite — but it complicates the setup.py file, which then needs to include sections for test-only dependencies, and many projects end up wanting to run a whole matrix of tests against different versions of Python and their regular dependencies, at which point you wind up back at tox, or something similar, as the thing to invoke.

So setup.py test is now officially deprecated in setuptools, which was where it came from in the first place. And the test commands in my tox.ini file are now just invoking the runtests.py helper (for Django apps) or pytest directly (for non-Django libraries).

Repository layout: new and improved!

A couple years ago, I was just starting the process of reorganizing the repositories for my open-source projects. Previously, I always had the actual module top-level in the repository, usually with tests nested inside. Here’s an example of what that looked like.

And here’s what the same repository looks like today.

The biggest differences are that the actual codebase is now inside a directory named src/, and the tests have moved to a top-level directory named tests/.

The reason for switching to the src/ directory is something I’ve covered before. Here’s a good writeup from Hynek Schlawack about it, which was one of the articles that convinced me to make the switch myself. The short summary is: this layout ensures that your code has to package and install correctly, or else its tests will fail. Leaving the module top-level means it’s explicitly on the Python import path during test runs, so packaging failures may pass unnoticed.

And moving tests/ to top level is mostly a way of avoiding unnecessary things in the final install of the package. My Python packaging configuration will include the tests/ directory in the package, but not install the tests/ directory. I do this because, in my experience, most people don’t actually want the tests installed; at most, they want to be able to run them from a downloaded copy of the package, to verify it’s good before installing, and then not have them around after that.

Coverage: still doing it

I run my tests under coverage, and have the coverage level output as part of the run. I also currently set it to fail the run if coverage dips below 100%.

As I mentioned last time I wrote about testing, this isn’t because I have 100% coverage as a goal. Achieving that is so easy in most projects that it’s meaningless as a way to measure quality. Instead, I use the coverage report as a canary. It’s a thing that shouldn’t change, and if it ever does change I want to know, because it will almost always mean something else has gone wrong, and the coverage report will give me some pointers for where to look as I start investigating.

And that isn’t just hopeful thinking: using coverage this way has actually caught and given me notice of at least a couple bugs I wouldn’t have found through other mechanisms (or at least not nearly as quickly). So whenever I start a new project and get its tests in good shape, I turn on the coverage report for every test run and keep an eye out for it suddenly changing.

Tests still involve more than just tests

Just running the contents of the tests/ directory is a good start, but there’s more that can and should be done. And if you have a look at the tox.ini file of one of my projects, you’ll see there’s more going on. I’ve always had at least a couple extra tools running alongside tests, but now I run them as separate environments in the tox config.

Here’s an example of my standard tox config, from django-registration. It ends up running 13 different test environments, and only 7 of them are accounted for by different Python/Django versions.

Two of the extra environments are formatting checks. One runs black over the codebase and fails if black would make any changes; another runs isort, and similarly fails if isort would make any changes. I haven’t made up my mind about black yet — there are still some things I dislike about how it formats — but the number of egregiously-wrong (in my opinion) things has been diminishing over time, and it at least has the virtue of settling arguments about formatting.

A third is the flake8 linter, which can catch many types of problems automatically; this environment fails if flake8 reports any problems.

A relatively new addition is mypy, which is a static checker for Python 3’s optional type hints. I’m not entirely sold on mypy, or Python 3’s type hinting in general (I’m not a fan of some of the design directions it took early on), but for now I’m giving it a try. It’s not the most useful thing in a Django app, since Django itself doesn’t have type hints — though there is funding for a project to add them, if you’d like to pick that up! — and so in those projects I’m running mypy in a pretty lax way while gradually adding type hints over time. In non-Django libraries I’m working on annotating function/method signatures; the akismet API wrapper already has them, for example.

Finally, I have two runs dedicated to documentation. I care a lot about good documentation, and I try to set good examples for it in my personal projects, most of which have far more documentation than lines of code. I’ll happily package up and publish new releases just to ship improvements to documentation, and in my personal projects the documentation is where I spend the majority of each new release’s work.

So one of the environments in my standard tox.ini file does a full build of the documentation with Sphinx, and lets me check for any errors or warnings that might indicate problems. Another uses the sphinxcontrib-spelling package to run a spell-checker over the documentation.

Also, most of these tools want at least some level of configuration, and are able to read it from a setup.cfg file, so that’s where I put it. If you look at one of my repositories, you’ll find sections in setup.cfg for any or all of coverage, flake8, isort, and mypy, depending on the needs of that project. Since its introduction as a setuptools-independent way to specify build-system requirements (which was previously a chicken-and-egg problem, where you had to parse setuptools-specific files to find out that a package wanted to use something other than setuptools), many popular Python tools have also begun supporting configuration via a pyproject.toml file in the package. It’s possible I’ll switch to that in the future if it seems to be gaining traction as the standard declarative configuration file, but for now that title still mostly belongs to setup.cfg.

Miscellany

If you browse around in one of my tox.ini files, you’ll find a couple other standard things I do. One is a block of commands labeled cleanup:

[cleanup]
commands =
  find {toxinidir}/tests -type f -name "*.pyc" -delete
  find {toxinidir}/tests -type d -name "__pycache__" -delete
  find {toxinidir}/src -type f -name "*.pyc" -delete
  find {toxinidir}/src -type d -name "__pycache__" -delete
  find {toxinidir}/src -type f -path "*.egg-info*" -delete
  find {toxinidir}/src -type d -path "*.egg-info" -delete

Then each environment includes {[cleanup]commands} at the end of its own commands block. Running tests will leave behind some package-build artifacts and other things like Python bytecode files. But I want every test run to start from a clean state and not leave packaging artifacts behind, so this block of commands finds and removes those at the end of every run.

There’s also this:

[pipupgrade]
commands =
  {envpython} -m pip install --upgrade pip

which runs at the start of every test environment’s commands block. All it does is upgrade pip. I do this because tox will only regenerate each test environment’s virtualenv if the dependencies for that environment change, and pip now has a fast enough release schedule that it can get out of date. So I just make sure each virtualenv always updates itself to the latest pip on each test run.

Finally, I include this important line in the base testenv configuration:

setenv =
    PYTHONWARNINGS=once::DeprecationWarning

The PYTHONWARNINGS environment variable controls how Python reports warnings; the specific value here ensures I can see DeprecationWarning when it’s raised, but prevents repetitive warnings from cluttering up the test output. At some point in the future I’ll probably update this for my Django apps to pick up some Django-specific subclasses of DeprecationWarning.

That’s it (for now)

I think that covers everything I’m currently doing for testing my personal projects, and the reasoning for why I do things the way I do. I’ve got some thoughts on other things I’d like to try, such as adding another tox env I could invoke to produce release packages, and possibly even upload them to PyPI and tag the release in the git repository (I don’t want automatic releases from CI, but I wouldn’t mind consolidating my release process down to just running tox -e release or whatever), but that’s getting a bit far afield from testing.

And I want to emphasize, again, that everything I’ve mentioned here is my personal preference: these are things I like that work for me. They might not work for you, or you might prefer something else, and that’s OK! There’s always going to be a certain amount of subjectivity and opinion in testing, so remember that I’m just some random opinionated guy on the internet, and no matter what I happen to like, the best approach for your project will be one that works for you and suits your personal taste.