Let’s talk about DVCS

An entry published by James Bennett on July 28, 2008, Part of the categories Misc and Programming. 55 comments posted.

So, a few years ago all the cool kids were switching from CVS to Subversion. These days, all the cool kids are switching from Subversion to some form of distributed version control; git and Mercurial seem to be the ones with the largest market shares. This switch is being accompanied by a simply deafening amount of hype about DVCS and how it’s a revolutionary new paradigm and will completely change the way people work and… well, the usual stuff.

Over the past few months I’ve tried out both of the popular DVCS tools: first I spent some time puzzling my way through git and realizing that, though I’m fond of wiring together five dozen things with baling twine and duct tape to create a Rube Goldberg monstrosity, I’m not yet ready to trust such a contraption with something like, oh, an important codebase. It works for Linus because he wrote it, and that’s great for him. But it isn’t for me. Then I tried out — and am still poking around with — Mercurial, which is on the whole a much friendlier and more humane tool.

But all of this poking and prodding has left me with a couple of nagging questions about which I am intensely curious, so I’m going to write them up here and hope that someone smart can answer them for me.

OK, what’s different?

So, yeah, there’s a lot of hype about DVCS. And while I’m prepared to dismiss a lot of the hyperbole as being simply the sort of thing you get in these situations, there’s a lingering undercurrent of people suggesting that this is fundamentally different from how we’ve been doing things up until now. Which is kinda funny to me, because I look at why people get excited about DVCS and basically all I see are decent merging algorithms. If I had to guess, I’d say that 90% or more of the claimed benefits of DVCS really do boil down to “wow, it’s easy to merge stuff”.

While it’s nice to have better merging than Subversion — and, really, what doesn’t have better merging than Subversion? — that doesn’t exactly make for a brave new world.

The other big win people often claim for DVCS is offline commits. At first I was kind of puzzled as to why offline commits would be such a great feature, and then I saw that mostly, the hype around them was about having a nice little local revision log you could step through before sending your changes off somewhere else.

So DVCS — as far as I can tell — is about better merging and offline commits. Which is a problem if you’re a DVCS advocate, because neither one of these seems to be intrinsic to the nature of distributed version control.

Let’s take offline commit as an example. Suppose that a new feature gets added to Subversion, such that you can sit at your computer and edit files in your working copy, and when you’ve reached a logical stopping point you go and type something like:

$ svn waypoint -m "Finished refactoring the combobulator"

And suppose that this would make SVN record the current local changes and make a little local log entry. Then, when you were ready to start committing back to the repo, you could step through your waypoints and decide which ones you want and which ones you don’t. Or, if you realized that your changes actually broke the combobulator, you could take out that waypoint even earlier and just roll back to where you’d been previously.

I’m willing to bet that this would cover practically every use case for offline commits in a DVCS, because ultimately it’s the same thing: make some changes, note what they were. The cycle of edit-commit-edit-commit in a DVCS would simply become edit-waypoint-edit-waypoint in this hypothetical SVN. So this isn’t something that requires DVCS to implement; you can do it just fine in an old-fashioned centralized VCS.

And the same is true of merging: there’s no reason why the merging algorithm in a centralized VCS has to suck, and there’s nothing intrinsic to merging algorithms which requires a DVCS to get them right.

Which leads to a question: what, then, is it that’s so different about DVCS?

Or, more pointedly: if Subversion pushed out a new release tomorrow which had a “waypoint”-style feature and a better merging algorithm, what features would be left to make the case for DVCS?

I’ve spent some time thinking this over and can’t come up with an answer. Which is a problem for the DVCS folks, because if they don’t have an answer they’d better just give up now: sooner or later a popular centralized VCS will catch on and sprout these features, and DVCS will get sunk.

No, really, what’s different?

The other thing I don’t yet understand about DVCS is how it actually changes the way software development will work. Linux and Rails are two notable projects which have recently switched from centralized version control to DVCS (git in both cases), and as far as I can tell the change went something like this:

In other words: it looks like the workflow hasn’t actually changed at all. For something that’s purported to be a whole new paradigm of software development, that’s not good.

What’s worse, there’s a nasty tendency to muddle the meanings of important words. We’re told that Rails, for example, switched away from a “centralized” system, but it’s hard to see how the new setup is any less “centralized” than the old: there’s still a single master tree that forms the basis of public distribution, and there’s still a core team of privileged committers who act as gatekeepers to that tree. Same goes for Linux (and, even more confusing, when Linux used “centralized” version control people still routinely ran from branches maintained by core members of the project).

Which leads to another question, or perhaps a pair of questions: in the switch from “centralized” to “distributed” version control, in what sense exactly are things no longer “centralized” and what, precisely, has been “distributed”?

Once again, I can’t come up with an answer. And once again, that’s not good for DVCS folks, because if I can’t puzzle it out there’s no hope of winning over your pointy-haired boss.

I’ve got questions. You’ve (hopefully) got answers.

I’m not posting this to knock on DVCS in general or any individual tool in particular. I’ve been putting in some time to learn how this stuff works, and it seems neat, but if I’m going to really commit to DVCS I’ve got to know what it’s going to get me. If the answer is “no real change to how your project is developed, and no features that couldn’t be implemented in the next release of Subversion”, well, I’m probably not going to make the jump.

So: what am I missing here? There have to be answers to these questions, but I’ve been exploring DVCS tools for months now (ever since a certain co-worker started putting important things into a Mercurial repo; you know who you are) and I haven’t been able to find them.

On July 28, 2008, Stan said:

As I understand it, the Linux kernel switch from one distributed version control system to another. First there were patches and mail because Linus hated CVS. Then there was BitKeeper, which was Linus’ introduction to the concept of distributed version control. Then Andrew Tridgell pissed off Larry McVoy by reverse engineering the BitKeeper protocol, and finally Linus went off and spent a month writing git after having used distributed version control for 3 years. So the Linux kernel has been using some kind of distributed version control system since 2002.

On July 28, 2008, Jonathan Barrett said:

For me (current Mercurial user, was SVN/SVK user, looking at git), the following hold:

I do take your point that the hype surrounding DVCS is a little loud, and lots of people who don’t understand the wins are just following fashion and jumping on board, but when you look at the types of development a DVCS with great merging enables, you have something that really is game changing. If I decide to fork Rails for inhouse development, I can do that in either Subversion or git, but the new fork is MUCH EASIER to hand back to the Rails core team now that it’s in git.

Hope that helps!

On July 28, 2008, zgoda said:

I hope to see some answers here because I have the exact same questions.

On July 28, 2008, Marty Alchin said:

The closest I’ve come to using git is http://github.com/, and the fact that it uses git is completely irrelevant to me. If someone points to an equivalent service using SVN, I’d probably use that in a heartbeat instead; at least I’d already know the tools.

On July 28, 2008, Aaron Davis said:

I haven’t actually used any DVCS so take this with a grain of salt:

The merging feature clearly isn’t intrinsic to DVCS, but I think the offline commits are. From what I gather, you essentially turn your machine into a server. It would be like having a local subversion server on your client machine, and committing to that. Then when you are ready, commit to the main server.

I think, by adding a waypoint feature to SVN, you will have turned SVN into a DVCS.

Also, you’ve probably already seen it, but the Wikipedia Article has some more information.

On July 28, 2008, Aleander said:

$ svn waypoint -m “Finished refactoring the combobulator”’ Congratulation, you have just turned SVN into DVCS. With less “D” features than Mercurial, but a DVCS none the less.

On July 28, 2008, Yuri Baburov said:

What about pulling from one developer’s repository and pushing into another developer repository (say, not working yet code) without committing to some “main” repository? What about uncommitting changes in any repository? What about merge duplicates detection using file hashes?

You seem to work on projects alone. For you, “distributed” really means nothing at all. That’s sad.

On July 28, 2008, Paul D. Waite said:

I think Git, specifically, has a different conception of what version control is than Subversion. I think it treats the entire tree as one thing it’s managing (as opposed to Subversion treating it as a load of files, each tracked via the .svn directories). As such, I think Git is smarter about code moved between files.

Not sure if that has a lot of effect on common workflows though. Or, indeed, if it’s right.

You’re welcome!

On July 28, 2008, Rick Harding said:

The thing that DVCS does is make branching/merging a first class operation. It’s easy, encouraged, and that is what changes workflows.

Look at the debates between Ubuntu and KDE on the difference in using svn vs bzr. In a DVCS you can create a branch for each feature. You then bring the branches into one at intervals. If a feature is lagging behind release schedule you just don’t merge in that branch. You can keep working on it, but it won’t hold up anyone else.

In working with bzr I commit a lot more often. After all, it only effects me. I can try out many paths, work back and forth, and it’s a ton easier than doing it with a SVN repo served off somewhere.

In the development of Gnome Do we loved using bzr. If someone wanted to come in and try to implement a feature we just said “go for it, create a branch”. We could then follow it, test it, make suggestions. We didn’t have to setup extra accounts or create a branch for this person to use ourselves. Because merging is so much easier to do in bzr, once the guy got the branch in shape, it was not problem to merge it into one of our branches and get it into release.

If the guy never went anywhere, who cares. The branch doesn’t bring down our work at all or concern us in the least.

I’ve used Hg and bzr in a few projects and I’ve turned to using bzr-svn for my work stuff since I can get some of the advantages while developing on the projects work keeps in svn.

On July 28, 2008, Bill Mill said:

Rob Wilkerson had similar concerns, and he and I had a discussion in the comment thread that you may find interesting.

On July 28, 2008, Dirkjan said:

Okay, so I have other wins for you:

I think these are important benefits. One other side-benefit is that every committer can get full credit by way of his username in the changelog. As a contributor, I found that way nicer than when it showed the name of the core contributor with a thanks message for me. It’s also an administrative benefit, I guess. Another nice extra may be (at least for Mercurial) is that it’s largely written in Python, meaning that it’s much easier to inspect the source code, improve it and extend it using the extension model in various ways. In Subversion, you have to dive into KLOC’s of C code. Which would you rather hack on?

On July 28, 2008, Joao Marcus said:

Even if SVN had “waypoints”, it would need local branches, which are IMO the whole point of a DVCS. You can have your own branches, work inside them, discard them if you don’t need them anymore. It’s not only about offline commits, it’s about having your own offline repository, choosing what you want to upload to the central server, etc.

It’s hard to understand the benefits of a DVCS until you try it. I tried Bazaar, but even though it’s easier, I always go back to Git, mainly because of named branches. No need for bazillions of directories, one for each branch. My branches are inside of my repository.

On July 28, 2008, Kyle said:

DVCS is really good whenever you don’t have a reliable place for a central server.

An extreme example was pretty common for me back in college. I had three computers. A desktop running 5 different OSes and a laptop. The third ‘computer’ only half counts. It was a USB drive loaded with my documents and a collection of useful windows/mac programs, and it was used on dozens of random computers.

A DVCS is excellent for not only providing version control to my home directory, but for keeping all the changes synced up between whatever pair of computers I was using.

On July 28, 2008, Steven Osborn said:

As some other’s have mentioned branching is much cheaper and easier to manage.

On July 28, 2008, Marc Fargas said:

After the switch, people send in patches for review by the core team, which then decides whether to commit those patches to the master tree.

More likely: People send in patches to those responsible of part A of project X (Linux kernel). Those responsible take in those patches in their repositories, clean up stuff, test things, … and publish what they could call the master tree of part A (i.e: DVB drivers in the kernel). The “core developers” (aka Linus) would then only need to merge changes from those “trusted sources” into his master tree keeping the whole history of changes and not needing to review every patch taken into the tree (that’s supposed to be done by those responsible of each component).

At least that’s how I understand it ;)

On July 28, 2008, Gregory Collins said:

To me the distinguishing features are:

On July 28, 2008, Jonathan Barrett said:

Real world example:

http://www.rubyenterpriseedition.com/faq.html#fork

This kind of under-control upscale development would be a complete pain with Subversion. The win isn’t the features - it’s the priority given to the features. There’s no longer a single list of “versions” in the canonical trunk - there’s a pool of updates, and you can easily choose from whom and to whom you take and give those updates.

Bear in mind, too, that many people had similar “subversion is just CVS with some more features; it’ll catch up” worries back in the day. I think we’re seeing a similar shift.

On July 28, 2008, wm tanksley said:

In other words: it looks like the workflow hasn’t actually changed at all. For something that’s purported to be a whole new paradigm of software development, that’s not good.”

No, that’s the point: a DVCS has much more capability to adapt to the user’s workflow. You can use it in almost exactly the same way, and that lets you get off to a quick start. Later on, when you decide to change workflow, you can just do it. Consider how Linux is working now: they’ve changed workflow many times just recently; the latest that I know of is the ‘linux-next’ tree.

On July 28, 2008, Michael Thompson said:

Several DVCS benefits have been mentioned above, but one that I find quite valuable is the ability to version control something locally without necessarily sharing the repository elsewhere.

I know this goes against the ideologies of using a DVCS (and is fairly git-specific), but being able to start a project locally, run “git init” and have it version-controlled without setting up a public/private repo on another server or machine is wonderful for the early stages of a project or for simple, personal version control.

On July 28, 2008, wm tanksley said:

Thompson, I’m pretty sure that using DVCSes to manage local-only projects is neither git-specific nor against the spirit of a DVCS. I’m kinda puzzled why you think it would be.

It’s the majority of my use of Mercurial. It both protects me against stupid mistakes, AND provides for a quick and easy backup to a thumb drive.

-Wm

On July 28, 2008, Florian Jung said:

Imho subversion with the proposed waypoint-feature is essentially the same as bzr-svn, i.e. it would turn into a DVCS.

My main reason for using bazaar is that I don’t need a centralized server. This way I can easily synchronize repositories between different computers when I need to work at home, at the university, or wherever I am. Moreover setting up a new repository in bazaar usually takes less than 10 seconds and since then I use version control in a lot more cases than before (sometimes even for a single configuration file).

On July 28, 2008, Rob said:

I’ve been using darcs for a year or so now, and one of the things I like the most about it is the ability to pick which changes to record (I think this is called cherry-picking). Basically, if I work on several things at once, I can record them as separate commits, even if all of the changes occur in the same file.

Darcs also eschews revision numbers — the state of the repository is just the commits (patches) that have been applied. So if I record two patches, any other branch can pull either patch independently of the other (as long as they don’t touch the same line in a file).

It may not sound ground-breaking, but I’ve found it extremely useful when writing code and documents — I would not like to move to another VCS unless it supports these features (darcs also has an extremely easy-to-use interface, btw).

On July 28, 2008, Chris Flynn said:

I agree there is a lot of hype about these newfangled DVCS. I recently tried Git and I like it above Subversion for a few reasons. Yes, working offline is better and faster. If I have no internet connection, I don’t have to wait to commit. This encourages more commits so I can more easily roll back if I need to. Plus not all these commits have to get submitted to the master branch, I can push them all as one change. To me, this is a biggie, I can take more risks with my code.

Branching happens so fast. I know subversion talks about cheap branches but branches in Git are so much easier. Plus switching to different branches is also super easy and super fast. I now basically create a new branch for each feature I’m working on. I can switch back and forth between them easily and quickly. If I had to communicate over the network for all these things it would take much longer.

Really I think it just boils down to the fact that my local copy is my repository and I can make branches willy-nilly and not care. I can say download the django repository (using git-svn) and maintain a local copy with all the changes and all the branches. Switch between trunk, newforms-admin, gis, whatever branch? No problem… takes miliseconds. Heck, I can even create my own branch and start working on a killer new feature, committing at will and I didn’t have to create another repository, worry about merging my changes as updates happen.

In fact, I have a local copy of django and a branch I called local. Here I’ve made a few changes, primarily to the admin setting stuff to be more local. Did I mention it was fast? That’s probably the biggest thing.

Most projects will always have a central or master repository so that users can just go there and download the latest version. However as a developer, I have my own little sandbox to do whatever I want with and commit whenever I want. It’s fast and I don’t need an internet connection to work. I resisted for a while thinking subversion was good enough, took a day and learned git and now I don’t think I can go back to subversion as my main VCS. (Though I do frequently use git-svn)

On July 28, 2008, Kjell said:

git add —patch and git rebase -i are some pretty cool git features that would make it hard for me to want to go back to svn.

The first lets me look through what’s changed in my ‘working tree’ (the files on my computer, not yet added to git’s index or committed) and selectively pick hunks of those files to add to the index and later commit.

rebase -i lets me interactively change the history of my local repository before I sent it off to another party, so if I want to rearrange things in my history or zap two commits into one or split one commit into two I can.

Also with subversion I consistently want to kill myself for messing up a commit in some stupid little way and not being able to go back and fix it (I have to fix it and make a second commit, ensconcing my mistake forever in the logs). With git commit —amend I can easily and completely redo the last commit.

What I see in git—although yes, its interface is a bit of a hack (but consistently improving with the addition of new tools)—is many layers of flexibility: working tree, index, local heads, pushed/pulled copies. Even though the people I work with use it mostly as we would a svn repo, regularly pushing and pulling to a centralized and protected location, we get much, much more flexibility. Also it’s really fast. It’s a not a small conceptual load to get your head around, but as I’ve figured out more of the details of how it works I’ve used it less and less like subversion with good merging and offline commits and more in a new and exciting way. (as far as code management can be new and exciting…)

On July 28, 2008, Martin Pilkington said:

The key difference is that in centralised you have to commit to the central server in order for someone else to get your changes. With distributed I can pull directly from someone’s local repository without ever needing to touch the branch they got the code from, but then I can merge into the original branch if I wish, by passing the person’s local repository where I got my copy of the code from.

On July 28, 2008, Oliver Andrich said:

I can’t give an answer, cause I am asking the same questions myself. I am currently using git and mercurial in parallel. So far, I think I like git more. But why have I considered to look into these systems?

I still believe and want a central repository for storing the one and only authorative code base for all our projects. What has been nagging me are offline commits and (offline) log analysis. The later is a pain for me, cause it tends to be slow (or impossible when I am on the road) for svn. I also like the ability to create local branches for experiments without loosing an easy way to merge back the results of these experiments into the trunk. So I use DVCS’s more like an advanced client to a centralized repository.

And git-svn is the best svn client available. :)

On July 28, 2008, Idan Gazit said:

Odd that nobody has pointed out Linus’s Google tech talk on the subject. I remember that he clearly outlines why he thinks DVCS is important, particularly for open-source projects. It’s very… “Linus” in flavor — but he made more of a social argument for DVCS than a technical one, IIRC. Youtube link: http://www.youtube.com/watch?v=4XpnKHJAok8

In the talk, he asserts that “commit access” has become something of a coveted resource, a status symbol that few have. In centralized VCS you have the cool kids and the outcasts.

In DVCS, “commit access” as a concept is diminished in favor of “pulling” changes from another branch you find interesting. Instead of Tom, Dick, and Jane having permission to commit to my tree, I subscribe to the changes in Tom, Dick, and Jane’s trees, because I find them interesting. Subtle but different.

Like object-oriented vs procedural, I don’t think he’s arguing that DVCS brings something technically novel — it simply changes the mental model into one that allows more people to accomplish more things, more easily. My clone of the “master” repo is the master repo. The only thing making it “authoritative” or “interesting” is the level of interest and respect I manage to attract from other parties.

In effect, Linus is saying that “official kernel releases” are nothing more than “Linux from Linus’s tree.” And if you don’t like that — great — clone the tree and go make yours “famous-er”. Linus’s tree has nothing special or magical about it, compared to yours. You don’t need commit access to Linus’s tree because his tree reflects his particular tastes; you have your tree to mold into whatever shape you please.

Interestingly enough, when you look at the development model of a project like the linux kernel, I think that his view simply reflects the reality of his workflow: a tight-knit group of individuals who trust each other, who need to efficiently cross-pollinate their trees, not squabble over which one is the “authoritative” one.

Bonus observation: I find a lot of parallels with the whole cryptographic/social “web of trust” concept. DVCS allows you to say “I trust Bob’s judgment, if he thinks this clone is noteworthy, then I might want to follow it.”

On July 28, 2008, manuelg said:

This is one very nice win with DVCS:

Monday: begin working on an experimental branch

Tuesday: the main branch has some improvements, grab them all (or cherry-pick) and apply to my experimental branch

Wednesday: submit my experimental branch to the main

With a non-distributed VCS, this is not directly supported. You get a “Tsk tsk tsk - for shame…” if confess this is your natural workflow.

Does this open up the possibility of merge headaches? Of course. If you want to avoid merge headaches, you have no choice but say to the whole world: “One at a Time”. You be the judge of how realistic this proclamation is to enforce, for your project.

Big loss from using a DVCS: people go off and work in secret. This can lead to staggering amounts of crap code.

On July 28, 2008, Travis Parker said:

Ryan Tomayko gave a pretty good rundown of the liberal workflow policy that seems to come with DVCSs. The thing is that you can use git or hg or bzr the same way that you use svn and not see much benefit, because the DVCSs don’t really improve much upon the SVN workflow - instead they provide other options.

Cheap branches are the major workflow changer, and svn can’t provide that without going distributed itself. Taking your svn waypoint hypothetical, if SVN has both local commits and a decent merging algorithm, then it really should also support local branches. After all, I can already commit locally and merge quickly/easily, so why exactly should I have to ask the project maintainer for a branch of my own? And if everyone has many of their own branches, they are all a great source of code changes, so they might as well provide servers too. If there are multiple servers out there for the same project, then svn switch should really be given the ability to switch between repos, not just branches on the same repo. And suddenly svn is looking a lot like the popular DVCSs.

On July 28, 2008, jlouis said:

I’ll keep to talking about git.

I’ll postulate that the usual way we work with code bases is: a gatekeeper/integrator/princess/commit-team decides what is going in for the next release. People send them patches for review and integration. Even if people has the commit-bit they often still send patches.

In this form of development git is a facilitating toolset. It makes it easy to contribute code with this style and it enhances patches so they are easy to integrate. You win speed here as integration is simpler for the integrator.

One must not underestimate the cognitive power that you can merge 10 branches in 10 seconds with 10 commands easily. If they take minutes to merge, you are going to look for your coffee mug.

Intrisically, it would be possible to have with SVN. But here is the cool thing about DVCS systems: You don’t need to administrate the repository. People clone it, make some changes and publishes their new branch. With a central system, you need some way to make them track the branch in the central repository. And that means you would have to administrate the user, give him a login, etc. Note that if you add much more on the model, SVN would become distributed :)

On July 28, 2008, Kevin Teague said:

Decentralization means giving more permissions to more people. In bioinformatics and public science where I work there is a tendency to centralize all of an organizations code into one repository. That organization can then determine a single policy for access control - nice and easy for that organization. But often this policy doesn’t make sense for all projects in the centralized repo - access tends to drift towards the “most limited access” policies. This is frustrating when collaborating, since collaborators often don’t have access to each others repositories, and so tarball exporting becomes the method of sharing code. It’s possible to have a better model of access control for a centralized version control - but you are still giving a central authority the determination to make all access control decisions. With a decentralized VCS the developer/scientist can make the decision to share the full source code history without needing to ask a central authority for permission to do so. If a scientist works on a code base at one organization for a while, then moves to a new organization, with a centralized system they would need to ask the former organization for a version control dump, then ask the new organization for a version control import - which can be a PITA is both organizations employ overworked sysadmins w/ grumpy demeanors who could care less about some dudes small personal science project. With decentralized they don’t need to ask anyone for permission, they’ve got the full history and can easily make a copy of it themselves.

On July 28, 2008, Bruce Stephens said:

I think you’re basically right for many projects.

I guess I’m just not sure that it’s so easy to add “waypoints” in a clean way. There’s obvious technical advantages in the DVCS turtles-all-the-way-down approach. It’s (I’d guess) simpler in implementation, and presumably simpler for users (there’s just one “commit”, though you also need a “fetch” and “push”, I guess).

And if you had “waypoints” or something (I suppose you’re imagining something close to what “quilt” provides?) then I’d want more. I often have a few local branches (I have a bit of work that’s not going to be ready for a week or two, and a bug-fix I need to do sooner, and the backport of that to a release branch).

And to do some of that it’s rather convenient (and surprisingly cheap) to have a local copy of all branches and tags.

That’s easy and obvious in a DVCS–I just create branches and merge, cherry-pick, etc., as I want.

On July 28, 2008, Hugh Bien said:

First, I want you to know I usually don’t subscribe to RSS feeds which don’t publish full articles but have a link to the full articles instead. Of course, your posts are pretty interesting so I end up subscribing =P.

The biggest advantage I see for DVCS is for super large, open source projects. So for Rails, for the old Subversion workflow a developer would make a patch of his changes and submit it. It would take a very long time and before it’s even accepted he could make some more changes that’s totally unrelated to his first patch. He’d have to make another patch. And so on…

Eventually, it gets pretty hard for him to keep track of all his patches he submitted since he doesn’t have a repository to commit to. If his friend makes patches that he wants and vice-versa, it becomes a real pain.

With DVCS, he clones the Rails code base, makes changes, and commits it to his own repository. Later he submits a patch with all of his changes as individual commits. It also becomes a lot easier to share his code changes with other developers who don’t have access to the master repository.

On July 28, 2008, Evan said:

You’ve claimed that the Linux kernel workflow didn’t change when they switched to BitKeeper; I’m surprised nobody (that I noticed, I skimmed the comments but may have missed someone) has pointed out what you’ve apparently missed.

Yes, both before and after, “people sent patches in for review by the core team, which then decided whether to commit those patches to the master tree.”

However, before BitKeeper, they did that manually, without any support from their tools. The critical advantage of a DVCS for Linux kernel developers is that it automates exactly the workflow they want to use, and arguably have to use (due to the volume).

For historical context, read some of the thread http://lkml.org/lkml/2002/1/28/83 and search for “Linus doesn’t scale”. Basically, in 2002 Linux reached a point where he couldn’t keep up with the number of patches submitted per day. Two years later, he was doing 48 patches per day, double what he’d managed before switching to BitKeeper. Now, he routinely does 500 changesets a day during the merge window, using git.

The workflow hasn’t changed, no. But now their tools don’t fight them.

On July 28, 2008, Alex said:

As a recent switcher from svn -> hg, I can say that I don’t really feel like anything new has been added to my workflow, just that things have improved. For me it’s mostly about the offline commits: The fact that the commit happens in a second or so, with comments, means that I actually integrate frequent commits into my workflow, instead of slacking on them and pushing a mega-commit at the end of the day, with notes on the multiple changes that commit represents. I know I shouldn’t have ever been doing it that way, but honestly, that time lapse means that subversion commits make me lose momentum. Hg commits do not.

On July 28, 2008, No99 said:

I see a couple things, one is when you’re really offline, you can still “checkin.” I’m not sure how often this is anymore but connectivity is still an issue. None of this solves the issue of just not having access to code you need to checkout though which I’ve found to be a bigger concern when I’m on the road.

The other thing I’ve noticed is that many of the advocates of DVCS systems like to checkin a lot, they had some thought or break through they want to record and have a way to get back to but they don’t want to “commit.” Others get to see your “commits.” With mercurial you can kind of checkin a thousand times on some little feature and then when you’ve got it all tweaked just so, you can commit it to another repository and they get the one grand checkin without your sloppy history. There is something to this, typically you can give developers a branch that like to work this way. In this post book, google it up kind of world, a lot of people just sort of hack away that way, they try stuff and when it works they keep a copy and when it doesn’t they just try other things.

There is also a 3rd thing, it’s more of my belief about the current culture of development. “distributed” is just a better buzzword. If static is good then dynamic must be great. If centralized is okay, then distributed is the tits. If a webapp is okay, then a web 2.0 app must be pretty sweet. With Linux, you really need a dozen trees or tree masters and if that’s best done with GIT then fine, they do have a very reliable, very fast central server though and I know for fact that the main tree owners are merge mutha-f-in-mastas. Something like Rails seems like it’s also a much more centralized project then 37signals wants to admit, git might just be lower management for them or something, there isn’t a good reason for them not to use subversion other than their ability to keep a machine running properly.

On July 28, 2008, John said:

I work on a huge Java web application at work that is managed with Clearcase. If I write something and want a coworker to try it out before checking in to main (the same as trunk in svn), I need to either email them the files (and source control goes out the window on their machine) or I check into a branch. For them then to pull in the branch means updating the clearcase view which will take at least 30 minutes, and then rebuild everything. But that lose of 1-2 hours is due to the size of our codebase.

If they could instead use DVCS and pull directly from my machine, it would be faster and better.

On July 28, 2008, Brett said:

It isn’t that DVCSs are a game-changer because they do something that SVN can’t do, it’s just that SVN doesn’t do them. Here are the reasons I am looking at moving Python eventually over to a DVCS.

Offline commits are handy when you lack Net, but more importantly allow non-committers to be able to do commits locally so they have the same benefit of version control. As you pointed out, if SVN added some waypoint feature, that would solve it. But since SVN doesn’t have that, your only option is a DVCS.

Building off of this offline commit ability, being able to make cheap branches is really handy. I always have a pristine copy of Python checked out to see what the standard behavior is compared to what I just added. But making a branch in SVN requires committing to the server, which is a waste when I am doing a quickie patch review. If I can branch from my pristine copy and have the DVCS keep track of the changes then the branch can be lightweight and not require talking to the server. Once again, SVN could support this, but it doesn’t.

And then comes in the better merge support. That allows better handling of disparate work that happened on the fringes that suddenly is brought in. People can work off in a corner until jsut before release, merge in the code, and then perform the usual regression work to make sure everything pulls in. Once again, SVN falls short as it stands, but nothing it couldn’t add.

So it isn’t that the individual features are phenomenal and not doable in SVN if they made it such that communicating with the server was an option, but at that point you have made SVN a DVCS. I don’t consider any of it a major shift, but it does help allow people who don’t have commit privileges to do better work with local version control. And for committers, cheap branches let you have multiple things going at once without massive overhead. If you view it as more of an evolution than a revolution, it makes more sense.

On July 28, 2008, Justin Lilly said:

I’ll preface this by saying I’m not one of the power users of dvcs. I just use the bare minimum to get by. That being said..

Before I tried it, I didn’t think DVCS offered anything beyond geek-cred. Turns out, there’s a bit more.

For me, the helpful feature was the ability to commit locally, then push to the central repository. Before dvcs, my process was to toy with a website/feature until it was done, then I would sync the 1 changeset to my centralized SVN server. Now, I preform much smaller changesets and more often then sync with the central server.

What this has really offered is the ability to complete 90% of a feature and still have it under version control. I don’t have to worry about “Am I commiting broken code?”, I can just do some work, commit my changes and if I notice any bugs along the way, it doesn’t affect everyone else using my repo. I fix it and commit. Then when I sync, people get my work over the last hour or two. No broken features for them, easy commits and diffs for me.

This goes two fold if you’re using some sort of automated deploy script a la capistrano to sync your server with a central repo.

While svn could do this via your svn waypoint scenario, it simply hasn’t yet. I use git (preferred), hg and bzr for the same reason I use Python over PHP and OSX over Vista. It makes my job easier.. today. It helps me get real work done without meddling with the process.

Oh, and a few of the dvcs’s interface with svn… so its win-win, in my book at least.

On July 29, 2008, ryan said:

I think everyone has covered it. All I’ll do is point out that your post could be summarised as “SVN could be made to be more like git, so what’s the point?”

The point is that it isn’t, and hacking it into one would involve significant work. You could add DVCS features on top, but why? Why not just use something written to work that way from the start.

To a lone developer the distributed bit is moot, but all the features that being distributed gives you aren’t. Those features could be added to other things, sure. But the features exist now. In tools written from the ground up to work that way.

I mostly work alone with git, but the speed increase alone is more than enough to switch. I can’t stand using SVN now, and I barely stand Perforce. If I have to use SVN I just use git-svn.

One thing people seem to miss though is that no matter how distributed your work flow is, it’s centralised int he end. You still need to consolidate and ship a version. So at some point you do have a branch that is canonical. At least for a commercial style product.

DVCS just makes getting there less painful.

On July 29, 2008, Jonathan Barrett said:

Me again.

A few people have chimed in saying (paraphrasing) that svn is fine for centralised projects, and a DVCS is better for non-centralised projects.

I’d argue that the wins a DVCS brings to “centralised” projects are even greater. Take Rails. Under svn, 37s could quite easily remain “in control” of the project. Official Rails builds could only come from them, as they were in control of the master repository.

Let’s look at what git has changed here. Well, nothing. 37s still control what is perceived to be the “master” repository, and release as and when they want. The difference is that, if I want to make some very focussed modifications to Rails, I can just clone the master repo and do it myself, taking on all the management and support responsibilities. If suddenly my build becomes more popular, then people can start pulling releases from me, not from 37s.

Does this undermine 37s’s “canonical” status? No, of course not - it just means that they don’t have to worry about the pain of forking. If they decide that my mods are worth merging, it’s VERY easy for them to do that. If not, well, my cloned repo contains the whole history of their repo before my mods, so someone ELSE might decide to merge it. Everyone has their own repo, and they’re all built to work well with each other.

With subversion, I would never have got commit access to that trunk in the first place, because with subversion, commit access is a means to destroy a project. With a DVCS, commit access is a means to help a project - even, no especially a centralised project - thrive.

On July 29, 2008, Ryan said:

Rather than writing my lengthy thoughts here I wrote a response blog post which you can find at ryanfunduk.com/talking-about-dvcs.

On July 30, 2008, David Eads said:

Our company has been relatively slow in deploying a common version control system — the senior developers working on big projects have used svn and the junior folks have, for all intents and purposes, used nothing.

It may sound dumb, but Mercurial has made it easier for our VCS-fearing staff to embrace version control. I am still pondering the why, but the practical effect is that it is easier to teach DVCS to people who don’t have much prior knowledge about version control systems.

My guess is that it is a combination of the social ramifications of making commits, the ease with which one can clone and isolate their work, and the relative lack of gotchas as compared to SVN or (shudder) CVS. The conceptual understanding just seems to come more naturally. Which is funny, since it took me quite some time, as a heavy SVN user, to really see the advantages and understand the approach.

I believe that while all developers should know and understand VCS very well, easing the human and social factors involved in encouraging VCS usage can rarely hurt and often help a development team.

On July 30, 2008, Tom Willis said:

The reason there’s so much HYPE I’m assuming is because those ruby/rails guys have discovered it and decided they like it. Don’t knock the tools just because the users are going apeshit. I stopped paying attention to anything in ruby land a long time ago. And because of that I was completely unaware that there was some kind of war over Version Control systems that threatened the existence of subversion or whatever until I read this. Just use what works for you.

I have my preferences, but yet my day job requires I throw everything in TFS. Which is understandable since it’s their IP they want to pretend they can control it. The point is, I don’t care, I am still productive. Had I been a rubyist I’d probably be annoying everyone in the office with my rants about how everything we’re doing is wrong and DVCS is the new hotness and in general being a pain in the ass for the rest of the team.

Again, just use what works. There is no war of the VCS.

On July 30, 2008, Ryan Funduk said:

@Tom Willis:

Please don’t lump all ‘rubyists’ together as being crazy fanboys :(

Besides the insulting nature (not to mention it being just plain wrong) of doing that there’s also the fact that lots of other projects and areas of development are using DVCS tools.

MySQL, Linux (of course), Mozilla, OpenSolaris, GHC, Drupal, VLC, OLPC, Fedora, tons of libraries, the list goes on… and I’ve purposely left out Python/Ruby and other ‘niche’ or ‘hype’ language stuff (except Haskell, GHC is an amazing compiler regardless of your position on the language).

Anyway, what I’m saying is that even if you think all Ruby developers are in their own world going ‘apeshit’ there’s still plenty of reasons to look at DVCS.

On July 30, 2008, Tom Willis said:

@Ryan Funduk

I agree with everything you’ve said about dvcs. I was poking fun at the sensationalism in the original post. This whole notion that AnyTechnology is AnotherTechnology killer is totally absurd. The “deafening hype” referred to in the post out of the 2 notable projects mentioned(linux and rails) I would bet that it’s coming from rails more so than linux, just a gut feeling that in no way attempts to paint the ruby community with the same brush, but rather acknowledges that there are some loud voices that get attention and a somewhat blind following because rails is the latest hotness.

On July 30, 2008, Ben Collins-Sussman said:

Hey, I’ve posted quite a bit of criticism about DVCS, but I wanted to point out an important thing that it does: it makes the ‘branding’ of a product into nothing more than a social thing, as it should be.

That is, in centralized systems, the ‘core’ group of committers not only gets to decide what goes into the product or not (at least, the product with the particular brand name)… but they also have special tools. They’re allowed direct commit access. Everyone not part of the branding team has to deal with awkward patches mailed around.

In DVCS, everyone gets the same tools, both insiders and outsiders. The same core group of people decides what goes into the product, but their tools are no different than anyone else. Everyone can just easily participate in development, publish their changes to the world, or pull changes from anyone they want.

I think this is inherently a good thing. It makes the core team into a purely social phenomenon, not a group with elite development tools.

On July 31, 2008, David House said:

I think DVCS are just more flexible. They can emulate centralised systems pretty well, but the ability to do offline commits is more evolution than revolution in this area; it’s (very) useful but not exactly deserving of all the hype surrounding DVCS. However they also support pretty much every other system of VC you could think up, e.g. personal repositories in several different locations (home, work, USB stick) without a clear notion of a central repository, without having to change your VC metaphor or even your workflow too much.

On July 31, 2008, Danie Roux said:

The key thing for me closely mirrors what Ben said:

DVCS allows code to flow in more of a bazaar way. Centralized is a closer fit for the cathedral. I wrote this blog entry about it a few years ago:

http://blog.danieroux.com/2005/09/15/distributed-version-control-the-perfect-fit-for-the-bazaar/

On August 1, 2008, Max Ischenko said:

Good questions. I think Linus answers them nicely in his Git talk, see http://www.youtube.com/watch?v=4XpnKHJAok8.

On August 4, 2008, Daniel Nyström said:

I’m a Bazaar fan and one great thing with Bazaar is that it’s very easy to get going with, even if you’d never used version control before. More people will be able to participate. ;)

On August 5, 2008, 8itchin said:

Sorry dude, but you obviously haven’t understood DVCS yet. Sure, a couple of DVCS “features” considered in isolation don’t look that impressive. But in context, those features do have fantastic workflow implications. First, anyone who wants to develop on a project gets the full power of VC (see Ben Collin-Sussman, above) and the ability to effortlessly merge in changes from any other developer. Second, a new branch is the safest way to develop a new feature, period, and once you understand how easy it is to throw branches around, develop, and merge, as a lone developer, or as part of a team, you will never go back. DVCS is a rare thing: something that is both simpler and more powerful than what came before.

Finally, to correct a common misconception: DVCS IS ALSO BETTER (easier and more secure) FORCENTRALIZEDPROJECTS! Because you can, if you like, have an official master repository with only one write-priveleged administrator, without restricting development in the slightest: the admin chooses to pull in changes from other trusted developers once he is satisfied that they are good.

On August 8, 2008, Ryan said:

I have to say the biggest issue I encounter when talking about git/dvcs is the mind set evidenced by these types of comments:

I wont use a shell based interface”

Sounds great, hope they make a gui”

I can’t live without a visual diff tool”

I don’t want to go backwards. We abandoned shells decades ago.”

Basically people who absolutely will NOT even try git because it is a CLI app. That single point overshadows ALL other issues for them.

On August 11, 2008, Ryan Funduk said:

@Ryan:

That’s interesting, you’ve actually had people say that their problem with dvcs is the shell? Personally, I wouldn’t want those sorts of people working on my code anyway.

I use the shell for a lot more than just dvcs… Infact, I think the question isn’t what I do with the shell, it’s what I don’t do with the shell.

On August 18, 2008, Ryan said:

@Ryan Funduk

I sort of agree with those comments but in the professional world that isn’t always a choice… I’ll give some perspective:

I am a game developer. Personally, I use a Mac at home (and the shell a lot) and I use Windows at work. Due to the type of game development I’m currently involved in I also use the shell a lot at work.

However, I’m not sure this is the norm. It certainly wasn’t the norm in my game development career up until I moved to my current employer/project.

Game development in my experience is usually synonymous with Windows and Visual Studio, even when developing for the major consoles.

That is the environment from which these type of comments emanated. My response is disbelief. But I really don’t know how to counter them. No amount of CLI whiz bangery seems to have any impact.

Comments for this entry are closed. If you'd like to share your thoughts on this entry with me, please contact me directly.

ponybadge