Let’s talk about DVCS

July 28, 2008 Misc, Programming

So, a few years ago all the cool kids were switching from CVS to Subversion. These days, all the cool kids are switching from Subversion to some form of distributed version control; git and Mercurial seem to be the ones with the largest market shares. This switch is being accompanied by a simply deafening amount of hype about DVCS and how it’s a revolutionary new paradigm and will completely change the way people work and… well, the usual stuff.

Over the past few months I’ve tried out both of the popular DVCS tools: first I spent some time puzzling my way through git and realizing that, though I’m fond of wiring together five dozen things with baling twine and duct tape to create a Rube Goldberg monstrosity, I’m not yet ready to trust such a contraption with something like, oh, an important codebase. It works for Linus because he wrote it, and that’s great for him. But it isn’t for me. Then I tried out — and am still poking around with — Mercurial, which is on the whole a much friendlier and more humane tool.

But all of this poking and prodding has left me with a couple of nagging questions about which I am intensely curious, so I’m going to write them up here and hope that someone smart can answer them for me.

OK, what’s different?

So, yeah, there’s a lot of hype about DVCS. And while I’m prepared to dismiss a lot of the hyperbole as being simply the sort of thing you get in these situations, there’s a lingering undercurrent of people suggesting that this is fundamentally different from how we’ve been doing things up until now. Which is kinda funny to me, because I look at why people get excited about DVCS and basically all I see are decent merging algorithms. If I had to guess, I’d say that 90% or more of the claimed benefits of DVCS really do boil down to “wow, it’s easy to merge stuff”.

While it’s nice to have better merging than Subversion — and, really, what doesn’t have better merging than Subversion? — that doesn’t exactly make for a brave new world.

The other big win people often claim for DVCS is offline commits. At first I was kind of puzzled as to why offline commits would be such a great feature, and then I saw that mostly, the hype around them was about having a nice little local revision log you could step through before sending your changes off somewhere else.

So DVCS — as far as I can tell — is about better merging and offline commits. Which is a problem if you’re a DVCS advocate, because neither one of these seems to be intrinsic to the nature of distributed version control.

Let’s take offline commit as an example. Suppose that a new feature gets added to Subversion, such that you can sit at your computer and edit files in your working copy, and when you’ve reached a logical stopping point you go and type something like:

$ svn waypoint -m "Finished refactoring the combobulator"

And suppose that this would make SVN record the current local changes and make a little local log entry. Then, when you were ready to start committing back to the repo, you could step through your waypoints and decide which ones you want and which ones you don’t. Or, if you realized that your changes actually broke the combobulator, you could take out that waypoint even earlier and just roll back to where you’d been previously.

I’m willing to bet that this would cover practically every use case for offline commits in a DVCS, because ultimately it’s the same thing: make some changes, note what they were. The cycle of edit-commit-edit-commit in a DVCS would simply become edit-waypoint-edit-waypoint in this hypothetical SVN. So this isn’t something that requires DVCS to implement; you can do it just fine in an old-fashioned centralized VCS.

And the same is true of merging: there’s no reason why the merging algorithm in a centralized VCS has to suck, and there’s nothing intrinsic to merging algorithms which requires a DVCS to get them right.

Which leads to a question: what, then, is it that’s so different about DVCS?

Or, more pointedly: if Subversion pushed out a new release tomorrow which had a “waypoint”-style feature and a better merging algorithm, what features would be left to make the case for DVCS?

I’ve spent some time thinking this over and can’t come up with an answer. Which is a problem for the DVCS folks, because if they don’t have an answer they’d better just give up now: sooner or later a popular centralized VCS will catch on and sprout these features, and DVCS will get sunk.

No, really, what’s different?

The other thing I don’t yet understand about DVCS is how it actually changes the way software development will work. Linux and Rails are two notable projects which have recently switched from centralized version control to DVCS (git in both cases), and as far as I can tell the change went something like this:

Before the switch, people sent patches in for review by the core team, which then decided whether to commit those patches to the master tree.
After the switch, people send in patches for review by the core team, which then decides whether to commit those patches to the master tree.

In other words: it looks like the workflow hasn’t actually changed at all. For something that’s purported to be a whole new paradigm of software development, that’s not good.

What’s worse, there’s a nasty tendency to muddle the meanings of important words. We’re told that Rails, for example, switched away from a “centralized” system, but it’s hard to see how the new setup is any less “centralized” than the old: there’s still a single master tree that forms the basis of public distribution, and there’s still a core team of privileged committers who act as gatekeepers to that tree. Same goes for Linux (and, even more confusing, when Linux used “centralized” version control people still routinely ran from branches maintained by core members of the project).

Which leads to another question, or perhaps a pair of questions: in the switch from “centralized” to “distributed” version control, in what sense exactly are things no longer “centralized” and what, precisely, has been “distributed”?

Once again, I can’t come up with an answer. And once again, that’s not good for DVCS folks, because if I can’t puzzle it out there’s no hope of winning over your pointy-haired boss.

I’ve got questions. You’ve (hopefully) got answers.

I’m not posting this to knock on DVCS in general or any individual tool in particular. I’ve been putting in some time to learn how this stuff works, and it seems neat, but if I’m going to really commit to DVCS I’ve got to know what it’s going to get me. If the answer is “no real change to how your project is developed, and no features that couldn’t be implemented in the next release of Subversion”, well, I’m probably not going to make the jump.

So: what am I missing here? There have to be answers to these questions, but I’ve been exploring DVCS tools for months now (ever since a certain co-worker started putting important things into a Mercurial repo; you know who you are) and I haven’t been able to find them.