Rewriting a 20-year-old Python library
Way back in 2005, lots of people (ordinary people, not just people who work in tech) used to have personal blogs where they wrote about things, rather than using third-party short-form social media sites. I was one of those people (though I wasn’t yet blogging on this specific site, which launched the following year). And back in 2005, and even earlier, people liked to have comment sections on their blogs where readers could leave their thoughts on posts. And that was an absolute magnet for spam.
There were a few attempts to do something about this. One of them was Akismet, which launched that year and provided a web service you could send a comment (or other user-generated-content) submission to, and get back a classification of spam or not-spam. It turned out to be moderately popular, and is still around today.
The folks behind Akismet also documented their API and set up an API key system so people could write their own clients/plugins for various programming languages and blog engines and content-management systems. And so pretty quickly after the debut of the Akismet service, Michael Foord, who the Python community, and the world, tragically lost at the beginning of 2025, wrote and published a Python library, which he appropriately called akismet, that acted as an API client for it.
He published a total of five releases of his Python Akismet library over the next few years, and people started using it. Including me, because I had several use cases for spam filtering as a service. And for a while, things were good. But then Python 3 was released, and people started getting serious about migrating to it, and Michael, who had been promoted into the Python core team, didn’t have a ton of time to work on it. So I met up with him at a conference in 2015, and offered to maintain the Akismet library, and he graciously accepted the offer, imported a copy of his working tree into a GitHub repository for me, and gave me access to publish new packages.
In the process of porting the code to support both Python 2 and 3 (as was the fashion at the time), I did some rewriting and refactoring, mostly focused on simplifying the configuration process and the internals. Some configuration mechanisms were deprecated in favor of either explicitly passing in the appropriate values, or else using the 12-factor approach of storing configuration in environment variables, and the internal HTTP request stack, based entirely on the somewhat-cumbersome (at that time) Python standard library, was replaced with a dependency on requests. The result was akismet 1.0, published in 2017.
Over the next six years, I periodically pushed out small releases of akismet, mostly focused on keeping up with upstream Python version support (and finally going Python-3-only, in 2020 when Python 2.7 reached its end of upstream support). But beginning in 2024, I embarked on a more ambitious project which spanned multiple releases and turned into a complete rewrite of akismet which finished a few months ago. So today I’d like to talk about why I chose to do that, how the process went, and what it produced.
Why?
Although I’m not generally a believer in the concept of software projects being “done” and thus no longer needing active work (in the same sense as “a person isn’t really dead as long as their name is still spoken”, I believe a piece of software isn’t really “done” as long as it has at least one user), a major rewrite is still something that needs a justification. In the case of akismet, there were two specific things I wanted to accomplish that led me to this point.
One was support for a specific feature of the Akismet API. The akismet Python client’s implementation of the most important API method—the one that tells you whether Akismet thinks content is spam, called comment-check—had, since the very first version, always returned a bool. Which at first sight makes sense, because the Akismet web service’s response body for that endpoint is plain text and is either the string true (Akismet thinks the content is spam) or the string false (Akismet thinks it isn’t spam). Except actually Akismet supports a third option: “blatant” spam, meaning Akismet is so confident in its determination that it thinks you can throw away the content without further review (while a normal “spam” determination might still need a human to look at it and double-check). It signals this by returning the true text response and also setting a custom HTTP response header (X-Akismet-Pro-Tip: discard). But the akismet Python client couldn’t usefully expose this, since the original API design of the client chose to have this method return a two-value bool instead of some other type that could handle a three-value situation. And any attempt to fix it would necessarily change the return type, which would be a breaking change.
The other big motivating factor for a rewrite was the rise of asynchronous Python via async and await, originally introduced in Python 3.5. The async Python ecosystem has grown tremendously, and I wanted to have a version of akismet that could support async/non-blocking HTTP requests to the Akismet web service.
Keep it classy?
The first thing I did was spend a bit of time exploring whether I could replace the entire class-based design of the library. Since the very first version back in 2005, the akismet library had always provided its client as a class (named Akismet) with one method for each supported Akismet HTTP API method. But it’s always worth asking if a class is actually the right abstraction. Very often it’s not! And while Python is an object-oriented language and allows you to write classes, it doesn’t require you to write them. So I spent a little while sketching out a purely function-based API.
One immediate issue with this was how to handle the API credentials. Akismet requires you to obtain an API key and to register one or more sites which will use that API key, and most Akismet web API operations require that both the API key and the current site be sent with the request. There’s also a verify-key API operation which lets you submit a key and site and tells you if they’re valid; if you don’t use this, and accidentally start trying to use the rest of the Akismet API with an invalid key and/or site, the other Akismet API operations send back responses with a body of invalid.
As noted above, the 1.0 release already nudged users of akismet in the direction of putting config in the environment, so reading the key and site from env variables was already well-supported. But some people probably can’t, or won’t want to, use environment variables for configuration. For example: they might have multiple sets of Akismet credentials in a multi-tenant application, and need to explicitly pass different sets of credentials depending on which site they’re performing checks for. So in any function-based interface, all the functions would not only need to be able to read configuration from the environment (which at least could be factored out into a helper function), they’d also need to explicitly accept credentials as optional arguments. That complicates the argument signatures (which are already somewhat gnarly because of all the optional information you can provide to Akismet to help with spam determinations), and makes the API start to look cumbersome.
This was a clue that the function-based approach was probably not the right one: if a bunch of functions all have to accept extra arguments for a common piece of data they all need, it’s a sign that they may really want to be a class which just has the necessary data available internally.
The other big sticking point was how to handle credential verification. It requires an HTTP request/response to Akismet, so ideally you’d do this once (per set of credentials per process). Say, if you’re using Akismet in a web application, you’d want to check your credentials at process startup, and then just treat them as known-good for the lifetime of the process after that. Which is what the the existing class-based code did: it performed a verify-key on instantiation and then could re-use the verified credentials after that point (or raise an immediate exception if the credentials were missing or invalid). I really like the ergonomics of that, since it makes it much more difficult to create an Akismet client in an invalid/misconfigured state, but it basically requires some sort of shared state. Even if the API key and site URL are read from the environment or passed as arguments every time, there needs to be some sort of additional information kept by the client code to indicate they’ve been validated.
It still would be possible to do this in a function-based interface. It could implicitly verify each new key/site pair on first use, and either keep a full list of ones that had been verified or maybe some sort of LRU cache of them. Or there could be an explicit function for introducing a new key/site pair and verifying them. But the end result of that is a secretly-stateful module full of functions that rely on (and in some cases act on) the state; at that point the case for it being a class is pretty overwhelming.
As an aside, I find that spending a bit of time thinking about, or perhaps even writing sample documentation for, how to use a hypothetical API often uncovers issues like this one. Also, for a lot of people it’s seemingly a lot easier, psychologically, to throw away documentation than to throw away even barely-working code.
One class or two?
Another idea that I rejected pretty quickly was trying to stick to a single Akismet client class. There is a trend of libraries and frameworks providing both sync and async code paths in the same class, often using a naming scheme which prefixes the async versions of the methods with an a (like method_name() for the sync version and amethod_name() for async), but it wasn’t really compatible with what I wanted to do. As mentioned above, I liked the ergonomics of having the client automatically validate your API key and site URL, but doing that in a single class supporting both sync and async has a problem: which code path to use to perform the automatic credential validation? Users who want async wouldn’t be happy about a synchronous/blocking request being automatically issued. And trying to choose the async path by default would introduce issues of how to safely obtain a running event loop (and not just any event loop, but an instance of the particular event loop implementation the end user of the library actually wants).
So I made the decision to have two client classes, one sync and one async. As a nice bonus, this meant I could do all the work of rewriting in new classes with new names. That would let me mark the old Akismet class as deprecated but not have to immediately remove it or break its API, giving users of akismet plenty of notice of what was going on and a chance to migrate to the new clients. So I started working on the new client classes, calling them akismet.SyncClient and akismet.AsyncClient to be as boringly clear as possible about what they’re for.
How to handle async, part one
Unfortunately, the two-class solution didn’t fully solve the issue of how to handle the automatic credential validation. On the old Akismet client class it had been easy, and on the new SyncClient class it would still be easy, because the __init__() method could perform a verify-key operation before returning, and raise an exception if the credentials weren’t found or were invalid.
But in Python, __init__() cannot be (usefully) async, which posed the tricky question of how to perform automatic credential validation at instantiation time for AsyncClient.
As I dug into this I considered a few different options, and at one point even thought about going back to the one-class approach just to be able to issue a single HTTP request at instantiation without needing an event loop. But I wanted AsyncClient to be truly and thoroughly async, so I ended up settling for a compromise solution, implemented in two phases:
- Both
SyncClientandAsyncClientwere given an alternate constructor method namedvalidated_client(). Alternate constructors can be usefully async, so theAsyncClientversion could be implemented as an async method. I documented that if you’re directly constructing a client instance you intend to keep around for a while, this is the preferred constructor since it will perform automatic credential validation for you (direct instantiation via__init__()will not, on either class). And then… - I implemented the context-manager protocol for
SyncClientand the async context-manager protocol forAsyncClient. This allows constructing the sync client in awithstatement, or anasync withstatement forAsyncClient. And sinceasync withis an async execution context, it can issue an async HTTP request for credential validation.
So you can get automatic credential validation from either approach, depending on your needs:
import akismet
# Long-lived client object you'll keep around:
sync_client = akismet.SyncClient.validated_client()
async_client = await akismet.AsyncClient.validated_client()
# Or for the duration of a "with" block, cleaned up at exit:
with akismet.SyncClient() as sync_client:
# Do things...
async with akismet.AsyncClient() as async_client:
# Do things...
Most Python libraries can benefit from these sorts of conveniences, so I’d recommend investing time into learning how to implement them. If you’re looking for ideas, Lynn Root’s “The Design of Everyday APIs” covers a lot of ways to make your own code easier to use.
How to handle async, part deux
The other thing about writing code that supports both sync and async operations is how to handle the things they have in common. There are a few different ways to do this: you can write one implementation and have the other one call it. Or you can write two full implementations and live with the duplication. Or you can try to separate the I/O and the pure logic as much as possible, and reuse the logic while duplicating only the I/O code (or, since the two implementations aren’t perfect duplicates, writing two I/O implementations which heavily rhyme).
For akismet, I went with a hybrid of the last two of these approaches. I started out with my two classes each fully implementing everything they needed, including a lot of duplicate code between them (in fact, the first draft was just one class which was then copy/pasted and async-ified to produce the other). Then I gradually extracted the non-I/O bits into a common module they could both import from and use, building up a library of helpers for things like validating arguments, preparing requests, processing the responses, and so on.
One final object-oriented design decision here (or, I guess, not object-oriented decision): that common code is a set of functions in a module. It’s not a class. It’s not stateful the way the clients themselves are: turning an Akismet web API response into the desired Python return value, or validating a set of arguments and turning them into the correct request parameters (to pick a couple examples) are literally pure functions, whose outputs are dependent solely on their inputs.
And the common code also isn’t some sort of abstract base class that the two concrete clients would inherit from. An akismet.SyncClient and an akismet.AsyncClient are not two different subtypes of a parent “Akismet client” class or interface! Because of the different calling conventions of sync and async Python, there is no public parent interface that they share or could be substitutable for.
The current code of akismet still has some duplication, primarily around error handling since the try/except blocks need to wrap the correct version of their respective I/O operations, and I might be able to achieve some further refactoring to reduce that to the absolute minimum (for example, by splitting out a bunch of duplicated except clauses into a single common pattern-matching implementation now that Python 3.10 is the minimum supported version). But I’m not in a big hurry to do that; the current code is, I think, in a pretty reasonable state.
Enumerating the options
As I mentioned back at the start of this post, the akismet library historically used a Python bool to indicate the result of a spam-checking operation: either the content was spam (True) or it wasn’t (False). Which makes a lot of sense at first glance, and also matches the way the Akismet web service behaves: for content it thinks is spam, the HTTP response has a body consisting of the string true, and for content that it doesn’t think is spam the response body is the string false.
But for many years now, the Akismet web service has actually supported three possible values, with the third option being “blatant” spam, spam so obvious that it can simply be thrown away with no further human review. Akismet signals this by returning the true response body, and then adding a custom HTTP header to the response: X-Akismet-Pro-Tip, with a value of discard.
Python has had support for enums (via the enum module in the standard library) since Python 3.4, so that seemed the most natural way to represent the possible results. The enum module lets you use lots of different data types for enum values, but I went with an integer-valued enum (enum.IntEnum) for this, because it lets developers still work with the result as a pseudo-boolean type if they don’t care about the extra information from the third option (since in Python 0 is false and all other integers are true).
Python historical trivia
Originally, Python did not have a built-in boolean type, and the typical convention was similar to C, using the integers 0 and 1 to indicate false/true.
Python phased in a real boolean type early in the Python 2 days. First, the Python 2.2 release series (technically, Python 2.2.1) assigned the built-in names False and True to the integer values 0 and 1, and introduced a built-in bool() function which returned the integer truth value of its argument. Then in Python 2.3, the bool type was formally introduced, and was implemented as a subclass of int, constrained to have only two instances. Those instances are bound to the names False and True and have the integer values 0 and 1.
That’s how Python’s bool still works today: it’s still a subclass of int, and so you can use a bool anywhere an int is called for, and do arithmetic with booleans if you really want to, though this isn’t really useful except for writing deliberately-obfuscated code.
For more details on the history and decision process behind Python’s bool type, check out PEP 285 and this blog post from Guido van Rossum.
The only tricky thing here was how to name the third enum member. The first two were HAM and SPAM to match the way Akismet describes them. The third value is described as “blatant spam” in some documentation, but is represented by the string “discard” in responses, so BLATANT_SPAM and DISCARD both seemed like reasonable options. I ended up choosing DISCARD; it probably doesn’t matter much, but I like having the name match the actual value of the response header.
The enum itself is named CheckResponse since it represents the response values of the spam-checking operation (Akismet actually calls it comment-check because that’s what its original name was, despite the fact Akismet now supports sending other types of content besides comments).
Bring your own HTTP client
Back when I put together the 1.0 release, akismet adopted the requests library as a dependency, which greatly simplified the process of issuing HTTP requests to the Akismet web API. As part of the more recent rewrite, I switched instead to the Python HTTPX library, which has an API broadly compatible with requests but also, importantly, provides both sync and async implementations.
Async httpx requires the use of a client object (the equivalent of a requests.Session), so the Akismet client classes each internally construct the appropriate type of httpx object: httpx.Client for akismet.SyncClient, and httpx.AsyncClient for akismet.AsyncClient.
And since the internal usage was switching from directly calling the function-based API of requests to using HTTP client objects, it seemed like a good idea to also allow passing in your own HTTP client object in the constructors of the Akismet client classes. These are annotated as httpx.Client/httpx.AsyncClient, but as a practical matter anything with a compatible API will work.
One immediate benefit of this is it’s easier to accommodate situations like HTTP proxies, and server environments where all outbound HTTP requests must go through a particular proxy. You can just create the appropriate type of HTTP client object with the correct proxy settings, and pass it to the constructor of the Akismet client class:
import akismet
import httpx
from your_app.config import settings
akismet_client = akismet.SyncClient.validated_client(
http_client=httpx.Client(
proxy=settings.PROXY_URL,
headers={"User-Agent": akismet.USER_AGENT}
)
)
But an even bigger benefit came a little bit later on, when I started working on improvements to akismet‘s testing story.
Testing should be easy
Right here, right now, I’m not going to get into a deep debate about how to define “unit” versus “integration” tests or which types you should be writing. I’ll just say that historically, libraries which make HTTP requests have been some of my least favorite code to test, whether as the author of the library or as a user of it verifying my usage. Far too often this ends up with fragile piles of patched-in mock objects to try to avoid the slowdowns (and other potential side effects and even dangers) of making real requests to a live, remote service during a test run.
I do think some fully end-to-end tests making real requests are necessary and valuable, but they probably should not be used as part of the main test suite that you run every time you’re making changes in local development.
Fortunately, httpx offers a feature that I wrote about a few years ago, which greatly simplifies both akismet‘s own test suite, and your ability to test your usage of it: swappable HTTP transports which you can drop in to affect HTTP client behavior, including a MockTransport that doesn’t make real requests but lets you programmatically supply responses.
So akismet ships with two testing variants of its API clients: akismet.TestSyncClient and akismet.TestAsyncClient. They’re subclasses of the real ones, but they use the ability to swap out HTTP clients (covered above) to plug in custom HTTP clients with MockTransport and hard-coded stock responses. This lets you write code like:
import akismet
class AlwaysSpam(akismet.TestSyncClient):
comment_check_response = akismet.CheckResponse.SPAM
and then use it in tests. That test client above will never issue a real HTTP request, and will always label any content you check with it as spam. You can also set the attribute verify_key_response to False on a test client to have it always fail API key verification, if you want to test your handling of that situation.
This means you can test your use of akismet without having to build piles of custom mocks and patch them in to the right places. You can just drop in instances of appropriately-configured test clients, and rely on their behavior.
If I ever became King of Programming, with the ability to issue enforceable decrees, requiring every network-interacting library to provide this kind of testing-friendly version of its core constructs would be among them. But since I don’t have that power, I do what I can by providing it in my own libraries.
(py)Testing should be easy
In the Python ecosystem there are two major testing frameworks:
- The
unittestmodule in the Python standard library, which is a direct port to Python of the xUnit style of test frameworks seen in many other languages (including xUnit style naming conventions, which don’t match typical Python naming conventions). - The third-party
pytestframework, which aims to be a more natively “Pythonic” testing framework and encourages function- rather than class-based tests and heavy use of dependency injection (which it calls fixtures).
For a long time I stuck to unittest, or unittest-derived testing tools like the ones that ship with Django. Although I understand and appreciate the particular separation of concerns pytest is going for, I found its fixture system a bit too magical for my taste; I personally prefer dependency injection to use explicit registration so I can know what’s available, versus the implicit way pytest discovers fixtures based on their presence or absence in particularly-named locations.
But pytest pretty consistently shows up as more popular and more broadly used in surveys of the Python community, and every place I’ve worked for the last decade or so has used it. So I decided to port akismet’s tests to pytest, and in the process decided to write a pytest plugin to help users of akismet with their own tests.
That meant writing a pytest plugin to automatically provide a set of dependency-injection fixtures. There are four fixtures: two sync and two async, with each flavor getting a fixture to provide a client class object (which lets you test instantiation-time behavior like API key verification failures), and a fixture to provide an already-constructed client object. Configuration is through a custom pytest mark called akismet_client, which accepts arguments specifying the desired behavior. For example:
import akismet
import pytest
@pytest.mark.akismet_client(comment_check_response=akismet.CheckResponse.DISCARD)
def test_akismet_discard_response(akismet_sync_client: akismet.SyncClient):
# Inside this test, akismet_sync_client's comment_check() will always
# return DISCARD.
@pytest.mark.akismet_client(verify_key_response=False)
def test_akismet_fails_key_verification(akismet_sync_class: type[akismet.SyncClient]):
# API key verification will always fail on this class.
with pytest.raises(akismet.APIKeyError):
akismet_sync_class.validated_client()
Odds and ends
Python has had the ability to add annotations to function and method signatures since 3.0, and more recently gained the ability to annotate attributes as well; originally, no specific use case was mandated for this feature, but everybody used it for type hints, so now that’s the official use case for annotations. I’ve had a lot of concerns about the way type hinting and type checking have been implemented for Python, largely around the fact that idiomatic Python really wants to be a structurally-typed language, or as some people have called it “interfacely-typed”, rather than nominally-typed. Which is to say: in Python you almost never care about the actual exact type name of something, you care about the interfaces (nowadays, called “protocols” in Python typing-speak) it implements. So you don’t care whether something is precisely an instance of list, you care about it being iterable or indexable or whatever.
On top of which, some design choices made in the development of type-hinted Python have made it (as I understand it) impossible to distribute a single-file module with type hints and have type checkers actually pick them up. Which was a problem for akismet, because traditionally it was a single-file module, installing a file named akismet.py containing all its code.
But as part of the rewrite I was reorganizing akismet into multiple files, so that objection no longer held, and eventually I went ahead and began running mypy as a type checker as part of the CI suite for akismet. The type annotations had been added earlier, because I find them useful as inline documentation even if I’m not running a type checker (and the Sphinx documentation tool, which all my projects use, will automatically extract them to document argument signatures for you). I did have to make some changes to work around mypy, though It didn’t find any bugs, but did uncover a few things that were written in ways it couldn’t handle, and maybe I’ll write about those in more detail another time.
As part of splitting akismet up into multiple files, I also went with an approach I’ve used on a few other projects, of prefixing most file names with an underscore (i.e., the async client is defined in a file named _async_client.py, not async_client.py). By convention, this marks the files in question as “private”, and though Python doesn’t enforce that, many common Python linters will flag it. The things that are meant to be supported public API are exported via the __all__ declaration of the akismet package.
I also switched the version numbering scheme to Calendar Versioning. I don’t generally trust version schemes that try to encode information about API stability or breaking changes into the version number, but a date-based version number at least tells you how old something is and gives you a general idea of whether it’s still being actively maintained.
There are also a few dev-only changes:
* Local dev environment management and packaging are handled by PDM and its package-build backend. Of the current crop of clean-sheet modern Python packaging tools, PDM is my personal favorite, so it’s what my personal projects are using.
* I added a Makefile which can execute a lot of common developer tasks, including setting up the local dev environment with proper dependencies, and running the full CI suite or subsets of its checks.
* As mentioned above, the test suite moved from unittest to pytest, using AnyIO’s plugin for supporting async tests in pytest. There’s a lot of use of pytest parametrization to generate test cases, so the number of test cases grew a lot, but it’s still pretty fast—around half a second for each Python version being tested, on my laptop. The full CI suite, testing every supported Python version and running a bunch of linters and packaging checks, takes around 30 seconds on my laptop, and about a minute and a half on GitHub CI.
That’s it (for now)
In October of last year I released akismet 25.10.0 (and then 25.10.1 to fix a documentation error, because there’s always something wrong with a big release), which completed the rewrite process by finally removing the old Akismet client class. At this point I think akismet is feature-complete unless the Akismet web service itself changes, so although there were more frequent releases over a period of about a year and a half as I did the rewrite, it’s likely the cadence will settle down now to one a year (to handle supporting new Python versions as they come out) unless someone finds a bug.
Overall, I think the rewrite was an interesting process, because it was pretty drastic (I believe it touched literally every pre-existing line of code, and added a lot of new code), but also… not that drastic? If you were previously using akismet with your configuration in environment variables (as recommended), I think the only change you’d need to make is rewriting imports from akismet.Akismet to akismet.SyncClient. The mechanism for manually passing in configuration changed, but I believe that and the new client class names were the only actual breaking changes in the entire rewrite; everything else was adding features/functionality or reworking the internals in ways that didn’t affect public API.
I had hoped to write this up sooner, but I’ve struggled with this post for a while now, because I still have trouble with the fact that Michael’s gone, and every time I sat down to write I was reminded of that. It’s heartbreaking to know I’ll never run into him at a conference again. I’ll miss chatting with him. I’ll miss his energy. I’m thankful for all he gave to the Python community over many years, and I wish I could tell him that one more time. And though it’s a small thing, I hope I’ve managed to honor his work and to repay some of his kindness and his trust in me by being a good steward of his package. I have no idea whether Akismet the service will still be around in another 20 years, or whether I’ll still be around or writing code or maintaining this Python package in that case, but I’d like to think I’ve done my part to make sure it’s on sound footing to last that long, or longer.