Yes, I have opinions on your open source contributions

July 11, 2022 Django, Meta, Python

Recently the Python Package Index announced that they will be implementing new account-security policies, and hoo boy are some people ever worked up about it. This has already escalated to the author of at least one high-download-count package — one which was a dependency of pytest, thus likely to break a lot of people’s testing and CI right as the weekend started, always nice — deleting their package from PyPI out of anger and announcing they intend to stop maintaining it, though it has now, at the author’s request, been restored by the PyPI maintainers.

And Armin has already written up his thoughts on it, and I disagree with a lot of what he says there. So I guess that means I should write something, too.

What’s happening

First of all, let’s be clear on exactly what is happening. In “coming months” — I haven’t seen a more precise timeline than this — the Python Package Index will:

Roll out a process where the most-downloaded packages are marked as “critical”, and
Identify all PyPI user accounts which have access to publish new releases of “critical” packages, and
Tell those account holders that they must either enable two-factor authentication (“2FA”) on their PyPI accounts, or no longer be able to publish new releases of the “critical” packages.

You can read more in their announcement and FAQ, but those are the big points. The 2FA requirement allows either TOTP authenticator apps or hardware keys, and PyPI has received coupon codes for a few thousand free hardware keys from Google which they are distributing to package maintainers who hadn’t already enabled 2FA.

Emails about this have been going out to accounts which maintain “critical” packages, on an apparently rolling basis (guessing this from the fact that I got emails for two separate packages, a day apart), for a little while now.

Things that should not need to be said

To get started: PyPI (of which I am not a maintainer; I merely follow the general drift of Python packaging and packaging-adjacent areas) is not maintained by “big tech” companies, or by any for-profit corporation. It’s mostly volunteer and usually understaffed. Jacob points out that people have been paid from time to time to work on PyPI, so it’s not entirely volunteer, but also see followups to Jacob’s comment about the nuances involved in that. But anyway, members of the PyPI team have been participating in threads on various sites and explaining patiently that yes, this is a policy PyPI decided to implement on its own initiative, and that it’s not something some shadowy “corporate” demanded of them.

This has not stopped many people from proceeding to argue based on the exact opposite of the truth, but such is the internet.

Also, because a lot of people who are arguing about this are going on and on about “freeloaders” and other such language: I don’t really like playing the “do you know who I am” card, but in this case I sort of have to. So if you don’t know who I am, I have dedicated a large chunk of my professional life to open source, both large projects such as Django — for which I held just about every available leadership position over the course of 13 years of continuous active involvement from which I’ve recently stepped back a bit — and small ones such as my own set of personal libraries and Django apps. I also worked at Mozilla for a few years where all the code I wrote (primarily on the Django backend that powered MDN for a while) in the course of my employment was open source.

Which means it’s difficult to accuse me of being some sort of “freeloader” who only ever consumes others’ code without giving anything back. I imagine someone probably will still try it, but now you hopefully know better.

Finally: I’ve seen quite a few people saying that the only beneficiaries of this are “big tech” or other corporate consumers of open-source software. Which would be a funny claim if it weren’t sad. I know from personal experience just how many community — formally non-profit, or otherwise — projects rely on open source software, how badly they and their communities/users would be hurt by a big breach, and how under-resourced they already are. Big companies can afford to have their own policies in place around vetting dependencies. Even if they do get some amount of “free rider” benefit from PyPI taking steps to secure user accounts, I’m willing to accept that as a side effect of protecting everybody else who doesn’t have those resources.

Points of agreement

In Armin’s post, I believe this is correct:

From the package index’ point of view increasing the protection for critical packages makes a lot of sense. Running a package index is expensive and the users of the package index really do want to reduce the chance that a package that they depend on is compromised.

And these two sentences are correct:

I think as an Open Source developer who is using the index for free, I can’t demand much from it. I’m in many ways beholden to the rules and requirements that the index upholds.

Though maybe not in the way he hoped, given how he and many other people are focusing on “demand” in the other direction.

If you believe nobody has the right to ”demand” an open-source maintainer do something or abide by some policy or restriction, then that ends the argument in more ways than people are appreciating. If you just want to say “nobody can demand I do this”, then OK, but you also can’t demand PyPI — which is an open-source project, too — do any particular thing or abide by any particular policy you’d like, which more or less removes any grounds you might have had to criticize their account security approach. They don’t owe you anything and don’t have to do what you want them to do, the end.

But a lot of people in comment threads are trying really hard to figure out a way to impose requirements and standards on PyPI’s maintainers but not on anyone else, which then contradicts the “it’s open-source, you can’t demand anyone do anything” basis of the whole argument. Somewhat amusing are people who try to argue that PyPI has a lot more power to affect people and so should be restricted more, which is an equally good argument for imposing on people who maintain major packages.

I’ve also seen people who seem to be trying to argue that PyPI can’t change what it asks of a user after that user starts using it. But again this ends up being self-contradictory: the fact that someone chose to release some software under an open-source license once doesn’t, in these folks’ view, obligate that person to continue maintaining or releasing new versions of it under the same terms forever. If they want to stop, or change their license, or make any other sort of change, then that’s their right. And so too PyPI’s maintainers have the right to make decisions about how they’ll do things in the future which do not come with a requirement to continue doing exactly what they’ve done in the past.

Responsibility

Meanwhile, a lot of this boils down to people who are incredibly angry at the thought that they have any responsibility as a result of having chosen to release some code under an open-source license.

Armin talks a bit about this:

However when I create an Open Source project, I do not chose to create a “critical” package. It becomes that by adoption over time. Right now the consequence of being a critical package is quite mild: you only need to enable 2FA. But a line has been drawn now and I’m not sure why it wouldn’t be in the index best interest to put further restrictions in place.

Instead of putting the burden to the user of packages, we’re now piling stuff onto the developer who already puts their own labor and time into it.

First of all, that last sentence is just a mess: many of us are both users and developers. I know I am, certainly, and likely every other “developer” is a “user” too, making it hard to draw as clear a distinction as a statement like this needs.

And Armin’s suggestion for what “burden” should be pushed to “users” is basically that publishing to PyPI should be as unrestricted as possible (as he puts it: “that the index has no policies beyond immutability of assets”), and “users” should be crowdsourcing the vetting of packaged code post-release. Which… well, let me just say that my actual first reaction to this was unprintable. This is not necessarily an argument against cargo-vet, which he mentions as an example, because it seems to be intended as part of a larger and more comprehensive set of measures for packaging and packaging-related security. My string-of-profanities reaction was to the idea that somehow a post-release vetting system pushing all the burden onto “users” would be sufficient on its own, or that it would be good because it would allow us never to ask anything of package maintainers.

So, look, I get that there are some people who want to live in a world built on caveat emptor and the idea that it’s always and only your fault if something bad happens to you. I get that there are some people who think this is the only kind of world open source can be. Maybe Armin is one of those people, or maybe he just argues like one without realizing or intending it.

But no. Just… no. That would be a terrible world, and a terrible model for open source software.

Some of this may just be due to incommensurable world-views. My view of the world is based on the idea that my actions may have consequences not only for me, but also for other people, and that I have some responsibility to at least consider those other people when deciding what actions I will take. There are very few things I can do that exist purely in a vacuum and affect only me, and there are exactly zero social things I can do which meet that description.

And open source is a social activity. It literally does not make sense in an asocial context.

(by the way: nearly every activity actually is a social activity, including many that you probably initially think aren’t)

Don’t be a you-know-what

So let’s talk about responsibility. Many people, including Armin, have been arguing that authors of open source code either don’t have, or shouldn’t have, any responsibility for things like the security of the eventual users of the code.

In a legal sense, of course, nearly every open-source license includes a warranty disclaimer, though the extent to which that actually protects the developer depends on the jurisdiction and on what the developer actually did.

In an ethical and social sense, though? Yeah, if you publish open-source code you do have some responsibilities, whether you want them or not.

For example: you have a responsibility to set expectations. If you’re throwing something on the internet to be some sort of ”worked for me, good luck” abandonware, for example, you probably should let people know that so they don’t waste their time and yours by sending you bug reports or patches or other contributions. Or if you do intend to maintain it and want to accept help from others, you probably should make that clear too. There are evolving standards for doing this, but you can just stick some info in a README if all else fails. If you don’t do this, people will criticize you when expectation mismatches occur, and their criticism will be justified.

You also have a responsibility not to be malicious. If, say, you publish a package that says it does a useful thing, and it actually contains obfuscated code that deliberately erases the user’s hard drive, that’s malicious behavior and in an ethical sense you don’t get to hide behind a license’s warranty disclaimer; other people will call you out for it and be absolutely justified in doing so.

And if your package gets established and has people regularly using it, you have some responsibility not to mess with those users. If you want to make backwards-incompatible changes, for example, communicate them and bump the major version number. If you want to stop maintaining the package, post an announcement about it and be willing to hand over to someone else to continue work. There are even groups out there like Jazzband that will help with this.

All of these examples are hopefully pretty clear responsibilities, because what they really boil down to is the basic societal expectation of “don’t be an asshole”. People really shouldn’t need to have them enumerated or have to be reminded that “don’t be an asshole” is a basic societal expectation, but here we are.

It goes both ways

Now let’s look at it from another angle: as a result of my involvement in open-source projects (primarily Django), a lot of useful doors have opened for me. My participation in open source has led to job opportunities, book-writing opportunities, speaking opportunities, all sorts of things that I probably wouldn’t have had or would have had much less of if I hadn’t been as involved in open source.

The same is true for Armin — he probably would not be where he is today without his open-source work. But I notice his blog post didn’t really touch on this. It focused solely on what he was being asked to do, and on the persona of an overburdened open-source maintainer. There wasn’t really any mention of his alternate persona: influential project founder and leader, Director of Engineering at a respected and successful company, etc.

If he doesn’t owe us anything, then we also don’t owe him anything. Perhaps he’d like to take back his code and the rest of the world can take back all the opportunities and other things he’s been given in exchange.

Of course, Armin and I are two fairly extreme outliers; most people who release something under an open-source license never have the kind of success that he’s had, or I’ve had. I know I’ve had a lot of luck at crucial points in my career, when things could absolutely have gone another way, and I’m sure the same is true of Armin. But the basic point remains true even on smaller scales: as I said above, open source really is a social activity, and as a social activity it comes with responsibilities and benefits for multiple parties in multiple directions. That’s why, for example, building a history of open source participation/contribution (a “GitHub résumé”, as it’s sometimes called) is often strongly encouraged for people who are trying to get hired in tech.

What can we ask?

A lot of people, including Armin, seem to be taking this as the starting point of a slippery slope. PyPI asks for 2FA today, what might they ask for tomorrow? I’ve seen some incredibly hyperbolic suggestions floating around, and they frankly boggle my mind. But there’s a reason why the slippery slope is a fallacy: you can build a slippery slope for basically anything. What’s harder, and what nobody has actually done here, is show how or why it would be likely for PyPI to do all the things people are slippery-speculating they might.

Meanwhile, some other people have just been spreading outright lies about what PyPI is doing, and claiming that they’re trying to harvest all sorts of personal data (PyPI explicitly does not support SMS as a second factor, so they don’t want your phone number; and the free security keys are optional and are ordered through Google, so PyPI also doesn’t want your mailing address).

Which really makes no sense. If you, person reading this, have so little trust in the people who run PyPI, why are you using it at all? Why interact with the Python community at all? Please, take your code and go home, and don’t let the door hit you on your way out. Maybe the rest of us could also terminate your license to use any of our code, too, just to make sure you don’t accidentally end up in some sort of open-source relationship with us where there might once again be things asked of you. We know you wouldn’t want that, so we can take steps to protect you from it.

More seriously: two-factor auth is such a reasonable bare-minimum and easy-to-do (from the account-holder end) thing for account security these days that the objections being raised make no sense to me. People who take reasonable and easy-to-work-with steps to improve security are not the sort of people I suspect of starting down a slippery slope! Yet at least one person already has loudly begun to take their code and go home over it.

And when you consider the potential impact of a major package being compromised, it becomes even more obvious that this is a good move. Would it be great to go all-in and require 2FA for everyone, and maybe some other security measures, too? Sure, but I get the impression that this is a compromise between:

PyPI’s chronic lack of (again: mostly volunteer) staffing, which is a barrier to universal 2FA since that increases the support burden for PyPI’s maintainers, and
The risk posed to PyPI, its users, and the Python community as a whole by potential attacks against key packages in the Python ecosystem

So rolling it out, for now, to just a subset of users who maintain the most-downloaded packages on the index, seems pretty reasonable to me. I’m sure there’s an expectation that some percentage of the affected users will, for lack of a better word, flame out over this. The same would be true if 2FA just became a universal requirement for all PyPI user accounts.

And honestly, I think 2FA is considerably less than what PyPI, or we the nebulously-defined “users” (whatever that actually means), would be justified in asking of package maintainers. Armin really ought to know better than to suggest the index should have no policies beyond “immutability of assets”. Spam packages, malware, typosquatters and other impostors, people who are following a tutorial in good faith but accidentally publish to the real PyPI instead of a sandbox/testing instance… all of these and more are things the index absolutely should have opinions and policies about. A package index like PyPI is and always will be a tradeoff between trying not to put up too many barriers to publishing, and trying to put up enough to keep the thing useful.

And there are some further steps PyPI could take that require very little (but still non-zero) effort on the part of package maintainers, and would offer pretty big payoffs in improved security. I’d have no problem with them doing so, and still wouldn’t view it as a slippery slope to packaging dystopia. Maybe some other people would, and would loudly protest and leave over it, but that seems no great loss — people who don’t care enough to do a few basic security things are not people whose code I’d want to be using anyway, so this would actually be a useful filter.

Let’s be done with this

There are other things people are talking about — like the perma-thread arguing that PGP solves every security problem and PyPI should just PGP all the things (it doesn’t solve every security problem and wouldn’t for PyPI, and PyPI actually already lets you PGP-sign packages, which should be a hint about how little signing actually solves on an index like PyPI) — but the big focus has been on this idea that open source equals no responsibility, and that 2FA is too much to ask of package maintainers. Hopefully I’ve convinced you, one way or another, that this doesn’t hold up to even pretty light scrutiny; that we most certainly can ask people to accept a certain amount of responsibility — such as securing their package-index accounts — as the price of entry to a large community like PyPI/Python; and that participation in such communities has both benefits and responsibilities, neither of which can be had without the other.

If you still can’t stand the idea that PyPI can ask you to do things as a condition of getting to publish your packages there, I don’t know what to tell you. If that’s enough to make you angrily flame out of Python or of open source altogether, then probably that’s the best outcome for both you and everyone else, since it’s unlikely that things were going to go well in the long term.

And finally: if you’re determined to take something I’ve said above, find the least charitable interpretation of it you can come up with, and argue with that, please know that I don’t read Hacker News and haven’t for years, so I won’t see or respond to you.