Variations on the Death of Python 2

May 5, 2020 Django, Python

On April 20th, 2020, a release manager named Benjamin Peterson smashed the “publish” button on Python 2.7.18.

The Python 2 release series reached the end of its upstream support from the Python core team at the start of the year. I don’t know for certain, but I assumed the timing of the actual final package was meant to occur during PyCon (which, until a global pandemic struck, was scheduled for mid-April), possibly so there could be some sort of nice ceremony to mark the occasion.

At any rate, Python 2 is done, at least from the Python core team’s perspective. While operating-system vendors (who work on different cycles) will be supporting Python 2 packages for a while yet, and some other community projects claim they’ll continue to support Python 2 interpreters for an indefinite period, the mainstream of Python development has now, finally, moved on. Popular libraries and frameworks mostly either have dropped, or are in the process of dropping, their Python 2 support (Django’s last release to support Python 2 — the 1.11 LTS initially released in 2017 — reached its end of upstream support in April, for example).

And with that, of course, come the retrospectives. I don’t have any grand unified thoughts on the end of Python 2, or the 2-to-3 transition, so this will be less a coherent post that makes a clear point, and more a series of vignettes that explore different thoughts.

Unmaintained software

I’ve mentioned a few times in discussions of the 2-to-3 transition that there are a lot of software-producing entities whose approach to maintenance work is, effectively, “don’t do it”. Many of these are for-profit companies, but not all; there, the focus is always on the current quarter’s results, or even just the current feature sprint, and it’s difficult or impossible to prioritize work aimed at long-term sustainability, which is what platform maintenance and upgrades are all about. Some entities manage to pull this off, but my experience is that they’re pretty rare. I’m fortunate to work for an organization that’s trying (and has done a better job than many), and even more so that my job there explicitly involves thinking about the long term and trying to ensure the future health of our platform.

Still, a lot of Python 2/3 discussions come down to various forms of this argument: “We already have Python 2, and it already works, so why should we spend time and effort upgrading to Python 3? We’d rather use that time to develop new features in our Python 2 software!” People with this viewpoint seem to resent the fact that the Python core team, and popular open-source packages, no longer support Python 2; they often express this as a feeling that they’re being “forced” into an upgrade they don’t want, or “dictated to” by other groups. Often, they loudly proclaim that they’re never going to upgrade — if they do rewrite anything, it’ll be in another language. Rust and Go are popular candidates for this, but of course a moment’s thought and a bit of investigation will show that other languages evolve over time, too, and especially the set of practices and popular libraries evolve significantly even if the core language itself remains relatively static. This has already produced a few storms of criticism in the Rust community, for example, where despite the care put into identifying stable “editions” of the core language, popular third-party packages still tend to adopt new features quickly and leave users of older versions of the language unsupported.

There are also people — not as many, but still enough to be noticed — who seem to go even further and want the ability to declare that a piece of software they wrote years ago is “done”. This isn’t quite abandonware, since often they’re willing to fix bugs if found, but is still quite close in concept, and many of them are unhappy at the prospect of needing to port their “finished” projects to Python 3 in order to keep their access to things like bug and security fixes in the language or in packages they’re using.

As a quick aside, there are also people who really do write software to be abandoned, but they don’t show up in arguments about Python, because they genuinely don’t worry about whether their code continues to work into the future. This isn’t inherently a bad thing, though sometimes it can lead to unfortunate results if other people find and start using and relying on a piece of explicitly-unmaintained code.

I don’t have any sort of knock-down counterargument for any of this. I’ve sometimes tried to convince people that long-term maintenance is important, or that, just as some people have asserted a person isn’t dead as long as their name is still spoken, a piece of software isn’t “finished” as long as it’s still in use. But the arguments never go anywhere.

Though there is one thing I think gets overlooked a lot: usually, the anti-Python-3 argument is presented as the desire of a particular company, or project, or person, to stand still and buck the trend of the world to be ever-changing.

But really they’re asking for the inverse of that. Rather than being a fixed point in a constantly-changing world, what they really seem to want is to be the only ones still moving in a world that has become static around them. If only the Python team would stop fiddling with the language! If only the maintainers of popular frameworks would stop evolving their APIs! Then we could finally stop worrying about our dependencies and get on with our real work! Of course, it’s logically impossible for each one of those entities to be the sole mover in a static world, but pointing that out doesn’t always go well.

Strings

Switching the string type to be an actual string in Python 3 was the right choice. I’ve seen some people complain that it’s harder to work with byte sequences now, but my experience has been the opposite: working with byte sequences in Python is so much nicer now, because the bytes type is free to actually be bytes, instead of also having to try to be the string type.

I’ve also said this before, but Python 3 didn’t really make it more difficult to work with strings. At most, it exposed people to difficulty that already existed. The snarky way to put this is that in the Python 2 world there were basically two kinds of projects: those which had already built their own equivalent of the Python 3 approach by encoding to/decoding from bytestrings at the boundaries and using unicode instances everywhere internally, and those which hadn’t yet been bitten badly enough to motivate them to do that.

Even the one seemingly-legitimate exception — code that works with files and filesystems — really isn’t an exception at all. The unfortunate truth is that, despite claims to the contrary, on many common types of systems file paths and names are arbitrary byte sequences with no inherent meaning or guarantee of being decodable to strings using any known encoding, but humans still want and expect to treat those byte sequences as meaningful text. Different systems have different approaches for attempting to support that expectation, including global, per-user, or even per-user-session encoding settings which suggest how to interpret those bytes. And, of course, different systems each do these things differently. Or are misconfigured. Or just outright lie about how they’re configured. And this can’t easily be solved by declaring file paths and names to be opaque bytes values: there are cases where that assumption doesn’t hold, and even when it does hold, it would still raise questions of how to display those values to a user, or how to allow user input of them or operations on them. Users want to see my_file.txt, not \x6d\x79\x66\x69\x6c\x65\x2e\x74\x78\x74.

(file contents are slightly easier, because it’s considered more acceptable for programming languages to require programmers to say what they want to do when reading or writing, and many common file formats are either non-text to begin with or carry their own metadata about how to interpret them; file paths and names, though, are generally expected to “just work” without extra effort)

Python 3 — after a variety of different attempts to grapple with this problem — now uses a combination of tactics. If you want the details, Victor Stinner has written a series of blog posts telling the full story. If you just want a summary: Python does its best to find out from your system what encoding it should use (on Windows this is a lot easier than on Unix-y systems!); in some cases (on Unix-y systems configured certain ways) Python ignores what your system tells it and chooses to use UTF-8 instead; and Python provides and sometimes by default turns on error handlers that allow losslessly “smuggling” undecodable bytes into and out of str instances using code points from the surrogate-pair range.

Meanwhile, in Python 2 you could sometimes forget about this complexity for a little while, because the way Python 2’s “strings” worked meant you might not run into visible problems immediately. But that’s not the same thing as there not being problems, and hiding a problem from me until later (potentially much later, like 3AM on a weekend) when it’s harder to track down and debug isn’t my idea of “easy”. And notably, Python isn’t the only language that’s had to get creative to solve this: Rust, for example, has a completely separate “platform string” type that can act as a possibly-undecodable bag of bytes when you need to work with things like filesystem paths.

How Python is used

Somewhat related to the lengthy diversion above about filesystems, I think a hugely overlooked aspect of Python 3 was how it went along with shifts in what people were doing with Python.

Back when Python 2.0 was released — October 16, 2000, according to the “what’s new in Python 2.0” document — Python was still largely seen, and widely used, as a language for writing Unix system tools, utilities, and scripts. There were of course people using Python for other things, but the “Unix scripting language” perception was real, and on the whole a good match for where Python was at that point in its history.

Things have… changed a bit since then. And a lot of changes in Python itself have either happily complemented, or been explicitly driven by, changes in how it’s used. The matrix multiplication operator @ and its associated magic method __matmul__(), for example, likely wouldn’t have been added if not for the huge rise in popularity of Python’s numeric/scientific software stack.

I’m honestly not sure whether the string change, which really was the biggest breaking change from Python 2 to 3, would have happened if not for the fact that Python had expanded so significantly into other areas by the time Python 3 was being developed. In the Unix scripting niche Python was originally known for, text handling has traditionally been a bit of a mess, and most systems and most scripts were effectively ASCII-only, or maybe (if you were really lucky and/or unlucky) ASCII-compatible encodings like Latin-1. Which is, coincidentally, what Python 2’s bytestrings, and most modules/packages you’d encounter, really wanted you to stick to (many of them, and Python 2’s str if you weren’t careful, would explode in more or less impressive ways whenever they saw byte values outside the ASCII range). But once Python started being used more widely for more types of programming, and especially once those programmers’ voices started having influence on Python’s development, the traditionally-Unix-y approach to text just wasn’t good enough anymore.

I’ve said a few times that I understand the frustration that’s been expressed by people who still use Python for scripting-language tasks, and who feel that Python 3 added a ton of complexity to their work for no reason. But I also think a lot of them have missed that, by being built around their use case in some pretty fundamental ways, Python 2 was imposing a lot of complexity on everyone else. As the proportion of people using Python for other purposes grew, I don’t think it made sense anymore to be doing that.

Python 3’s approach to strings vastly simplifies a lot of other domains of programming. Having the string type be a real Unicode string, and forbidding the former easy mixing of Unicode strings and byte sequences, eliminates whole classes of problems and bugs, many of which liked to lie in wait until they could set off a pager at an inopportune time. But as I mentioned above, it also forces the scripting folks to do what everyone else already had to do: think about where the boundaries of their programs are, encode/decode at those boundaries, and use Unicode everywhere internally. This is unfamiliar territory for a lot of people who are mostly used to the conventions of the Unix scripting world, but any approach that spares them from it will inevitably wind up back at causing significant pain to people doing other things with Python. And as rough as things like the filesystem mess have been at times, I still think on the whole it’s been better for Python’s adoption and broader use to no longer have the language just stick to whatever makes Unix and Unix-y scripting feel easier.

Python 2 wasn’t static

A couple of times in debates about Python, I’ve ended up sitting down and reading through all the “what’s new” documents for each Python 2 release to identify all the things that changed from 2.0 through 2.7. I wish now that I’d saved some of those comments, because there are two really important takeaways.

One is that the Python 2.x series was not the bastion of ideal backwards compatibility people like to pretend it was in internet arguments. Some of the incompatible changes involved things like reserving new keywords (example: yield in Python 2.3). Python 2.4 began the process of integer unification (completed in Python 3.0), and in the process changed what result you’d get from some integer literals and expressions (previously, you could take advantage of overflow behavior with things like large bit shifts or hex and octal literals to get negative values). A number of standard-library functions had their behavior changed in backwards-incompatible ways, and several modules were outright removed despite having been present in 2.0. String exceptions stopped working in 2.6. Nearly every 2.x release made at least some backwards-incompatible changes, which is why there are porting guides in many of the “what’s new” documents.

So it’s not the case that if you had, say, a Python 2.0 codebase way back in the day you’d be able to just swap in a later 2.x interpreter and have everything continue to work. You’d need to go carefully read all the porting guides for the various Python 2.x releases and make changes to your code to accommodate things that were changed/removed over the course of the 2.x series.

The other thing to note is what a different language Python had become by the end of the 2.x series, compared to the beginning. Imagine a Python without:

Decorators
The with statement and context managers
Operator overloading
Lazy iterables and the itertools module
Generator expressions
The bool type
The set type
The unittest and doctest modules
The functools module
True ternary conditional expressions
Absolute and package-relative imports
The collections module
Abstract classes and methods
try/except/finally

That’s a partial sample of things added to Python 2 after 2.0. If you took idiomatic late-2.x Python code back in time to October 2000, and showed it to a programmer fluent in Python 2.0, how much of it would they be able to recognize and read without help? Or, even if someone from that era was both very careful and very lucky and avoided using anything that was changed/removed during 2.x, would their Python 2.0 codebase even still be useful in a Python 2.6 or 2.7 world where so many of the idioms of the language were different?

No matter what people want to say, Python 2.x was not a fixed, unchanging platform.

Entitlement

Another thing that’s been present in so many arguments is the apparent feeling of many people that they are owed something. Most often, what they feel they are owed is the time and efforts of other people to maintain Python 2, and packages compatible with it, effectively forever.

This is something that’s always generally present in and around Free and open source software, of course, but it isn’t always as front-and-center as it’s been in the fights about Python. And nothing about either copyleft or permissive licenses imposes much in the way of obligations on authors of software: the obligations, when they exist, are nearly all on the recipients, and there’s a certain amount of background churn of license text as people try to close up what they see as loopholes that would let recipients evade those obligations.

But when you give something away for free, people get used to receiving it for free, and often forget that producing it wasn’t free. There are some people who “give back”, and get involved in maintaining and supporting software they use — I don’t know that I always succeed, but I’ve tried my best to be one of them, since open source software has given me so much over the course of my career — and there are some companies which do the same, by providing various types of resources (money, infrastructure, developer time), but in both cases they’re a tiny minority.

The Python core team maintained Python 2 (2.6, and later 2.7) for nearly twelve years past the release of Python 3.0. The people and teams behind many popular third-party packages also maintained their Python 2 support for that entire period, or close to it, willingly holding back from adopting new features that could improve their code and make maintenance easier.

Finally, the Python core team, and the maintainers of many third-party packages, announced they were moving on and would only support Python 3 going forward. And none of those people ever owed any of us anything. They chose to build and maintain Python 2 and its ecosystem, and after many years of that, they chose to do something else. That’s their right: producing Python, and popular packages in the Python ecosystem, has a cost in time, in effort, in other less-tangible resources. Nobody has a right to demand that other people bear that cost for them indefinitely.

Python isn’t just an interpreter

Of course, there are some groups who’ve said they are willing to bear some of the ongoing maintenance costs. The PyPy project, for example, is currently dependent on a Python 2 interpreter to bootstrap and so will be maintaining their own either for as long as PyPy exists, or for as long as it takes to migrate to bootstrapping on Python 3 (which they seem to think is either not feasible, or not something they want to do). And I’ve seen some apparently PyPy-affiliated people get a bit heated in comment threads as they insist that the Python core team’s interpreter (which is properly called “CPython”) was just one of many Python 2 interpreters and that it was irresponsible to describe the end of support for that single interpreter as the end of Python 2, because they see that as overlooking, dismissing, and hurting PyPy.

Unfortunately for that argument, Python 2 was much more than just the interpreter. It was also a large ecosystem of packages people used with the interpreter, and a community of people who maintained and contributed to those packages. I don’t doubt the PyPy team are willing to maintain a Python 2 interpreter, and that people who don’t want to port to Python 3 could switch to the PyPy project’s interpreter in order to have a supported Python 2 interpreter. But a lot of those people would continue to use other packages, too, and as far as I’m aware the PyPy team hasn’t also volunteered to maintain Python 2 versions of all those packages.

So there’s a sense in which I want to push back against that messaging from PyPy folks and other groups who say they’ll maintain “Python 2” for years to come, but really just mean they’ll maintain an interpreter. If they keep loudly announcing “don’t listen to the Python core team, Python 2 is still supported”, they’ll be creating additional burdens for a lot of other people: end users are going to go file bug reports and other support requests to third-party projects that no longer support Python 2, because they heard “Python 2 is still supported”, and thus will feel entitled to have their favorite packages still work.

Even if all those requests get immediately closed with “this project doesn’t support Python 2 anymore”, it’s still going to take up the time of maintainers, and it’s going to make the people who file the requests angry because now they’ll feel someone must be lying to them — either Python 2 is dead or it isn’t! — and they’ll probably take that anger out on whatever target happens to be handy. Which is not going to be good.

And note that while operating-system vendors are going to be maintaining feature-frozen Python 2 packages for at least a few more years, that mostly doesn’t cause the same sort of problem. Those vendors also take on the burden of maintaining their own copies of popular Python packages, and as a result they do provide the full ecosystem, or at least a reasonable facsimile thereof. And if someone files a misdirected support request it’s easy enough to send them back to their operating-system vendor. Other projects that insist “Python 2 is still supported”, but only in the sense of providing an interpreter, are the ones that will be causing significant confusion and frustration.

Past and future

I first learned Python around 17 years ago, from an online copy of Mark Pilgrim’s Dive Into Python. I liked it immediately, even if it took me a little while to figure out why. I still like it today.

I also think — as I have since 3.0 first came out — that Python 3 is a significant improvement over Python 2, and that the backwards-incompatible changes made in the 2-to-3 transition were due to real issues that existed in Python 2 and for which no less-drastic solution would have worked.

I dropped Python 2 (and, where applicable, Django 1.11) in my personal open-source projects at the start of the year, and I’m looking forward to finally fully adopting a lot of Python features that have been available for over a decade, but couldn’t be used in packages that supported both 2 and 3 (which I did up until the end of upstream Python 2 support). There’s actually a lot of cool stuff in Python 3, and the amount of cool stuff has only grown with each release, but adoption of those cool things has been held back by how long many projects felt the need to continue supporting both Python 2 and 3.

I’m incredibly grateful to the Python team, and all the many people who’ve contributed over the years, for both Python 2 and Python 3, and for everything yet to come.

I hope Python 2 gets to go look at the nasturtiums.