CVS and the Cathedral

Dick Grune's Gift to Amsterdam

Last episode, we touched on CVS briefly, the tool that wrapped Walter Tichy's RCS to track whole projects instead of individual files. But we skipped the person. And the person is where the story gets interesting.

Dick Grune was a computer science lecturer at the Vrije Universiteit in Amsterdam. Not a startup founder. Not a Silicon Valley type. A professor who taught compiler design and wrote textbooks about parsing techniques that are still used in university courses today. His other claim to fame is the Amsterdam Compiler Kit, a portable toolkit for building compilers across different processor architectures. If you studied computer science in Europe in the nineteen eighties, there is a decent chance you used something Grune built.

In the summer of nineteen eighty-four, Grune had a problem that had nothing to do with version control. He was supervising two students, Erik Baalbergen and Maarten Waage, on a project to build a new C compiler for the Amsterdam Compiler Kit. The project would run from July nineteen eighty-four to August nineteen eighty-five. Simple enough. Three people, one codebase. But the schedules were impossible.

The three of us had vastly different schedules. One student was a steady nine to five worker. The other was irregular. And I could work on the project only in the evenings.

Think about that for a moment. Nineteen eighty-four. No internet to speak of. No shared drives in the cloud. No Slack, no email worth mentioning. Three people working on the same code at different times of day, and the only existing version control, Tichy's RCS from episode one, tracked individual files one at a time. If you wanted to know what the whole project looked like yesterday, you had to manually check out the right version of every single file and hope you got them all right.

Grune's solution was a handful of shell scripts that wrapped RCS and added a layer on top. He originally called the tool cmt, because its core purpose was letting people commit versions independently.

The crucial insight was the copy-modify-merge model. Instead of locking a file so nobody else could touch it while you worked, everyone got their own copy. You edited your copy. Your colleague edited their copy. When you were both done, the system merged the changes. If the changes were in different parts of the file, the merge happened automatically. If they overlapped, the system flagged a conflict and a human sorted it out.

This sounds obvious now. In nineteen eighty-four, it was radical. The prevailing wisdom was that you had to lock files to prevent chaos. Grune's insight was that the chaos of merging was cheaper than the gridlock of locking.

After the compiler project finished, Grune kept polishing the scripts. He renamed the tool CVS, for Concurrent Versions System. And one particular evening sticks out. He was standing at a bus stop in what he describes as miserable weather, waiting for a bus that was not coming, and he started working through the logic in his head. All the possible states a file could be in. All the transitions between those states. He mapped the whole thing out in a table, dated December fourteenth, nineteen eighty-five. The design for what would become the most widely used version control system in the world was sketched in a professor's head at a cold Dutch bus stop.

On June twenty-third, nineteen eighty-six, Grune posted the shell scripts to comp dot sources dot unix, a Usenet newsgroup for sharing code. And then he went back to teaching compilers. He did not start a company. He did not seek funding. He gave his work away and returned to what interested him. This pattern, a professor solving a personal problem and accidentally inventing infrastructure, will come up again and again in this series.

The C Rewrite and the Rise

Grune's scripts worked. But they were shell scripts, which meant they were slow, brittle on edge cases, and difficult to extend. Three years later, a developer named Brian Berliner at a company called Prisma took Grune's design and rewrote the whole thing in C.

Berliner was not working on a university project. Prisma was a third-party developer working on the SunOS kernel for Sun Microsystems. They needed to manage large-scale Unix distributions, merging hundreds of files from upstream SunOS releases with their own modifications. Grune's shell scripts pointed the way, but Prisma needed something industrial-strength.

The rewrite added the client-server architecture that defined CVS for the next fifteen years. A central repository on a server. Developers checking out working copies to their local machines. Changes committed back to the server. By late nineteen eighty-nine, Prisma had deployed the rewritten version across fourteen developers managing over seventeen thousand files. It worked.

On November nineteenth, nineteen ninety, Berliner submitted CVS version one point zero to the Free Software Foundation for development and distribution under the GPL. This is the moment CVS went from a clever internal tool to the version control system for the open source movement.

The timing was perfect. The early nineteen nineties were the dawn of collaborative open source development. Projects needed a way to let strangers around the world work on the same codebase without descending into chaos. CVS gave them one server to rally around. You set up a repository on a public machine, posted the connection details, and anyone could check out the code, make changes, and commit them back. SourceForge, the first great hosting platform for open source projects, ran on CVS. The GNU Project used CVS. Apache used CVS. If you wrote open source code between nineteen ninety-five and two thousand five, CVS was almost certainly your tool.

Lock, Edit, Unlock, Wait

So what did it actually feel like to use CVS day to day? Let me walk you through a morning in two thousand one.

You arrive at work. You open a terminal. You run cvs update to pull the latest changes from the server. This requires a network connection to the central repository. If the server is down, or if you are on a plane, or if the network is slow because someone is downloading something large, you wait. There is no local history. Your machine has the files, and that is it. All the version information lives on the server.

You make your changes. You test them. You are ready to commit. You run cvs commit. But someone else has changed the same file since your last update. CVS refuses the commit and tells you to update first. You run cvs update, and CVS tries to merge the other person's changes with yours. If the changes are in different parts of the file, the merge works automatically. If they overlap, you get a conflict marker in your file, a mess of angle brackets and equals signs that you have to sort out by hand before you can try committing again.

This was manageable for small teams. For large projects, it was a constant friction. The Linux kernel had hundreds of developers around the world. Mozilla had contributors across every time zone. Every commit was a negotiation with the server, and every merge conflict was a manual intervention.

But the real pain of CVS went deeper than merge conflicts. The system could not handle file renames. If you renamed a file, CVS saw it as deleting the old file and creating a new one. The history was severed. All those months of changes, the story of how that file evolved, broken because you wanted a better name. Directories were even worse. CVS had no concept of versioning directories at all. You could not track an empty directory, and moving files between directories was a nightmare of manual repository surgery.

And then there was the commit problem. CVS did not support atomic commits across multiple files. If you were committing changes to ten files and the network dropped halfway through, five files would have the new version and five would have the old. The repository was in an inconsistent state, and there was no automatic way to roll back. Every partial commit was a small disaster waiting for someone to discover it.

Branching, which episode one promised would let teams experiment freely, was so painful in CVS that most teams avoided it entirely. Creating a branch meant the server had to process every file in the repository. Merging a branch back was worse. CVS had no built-in merge tracking, so it could not tell you which changes from a branch had already been merged to the main line. You had to track that yourself, manually, with notes and timestamps. Get it wrong, and you would re-apply changes that were already there, creating duplicate history and mysterious conflicts.

So teams did what humans always do when a tool makes something painful. They stopped doing it. They worked on one shared mainline. No branches. Everyone committing directly to the same code. Stepping on each other's toes, hoping for the best, praying nobody pushed something broken on a Friday afternoon.

The Cathedral and the Bazaar

While CVS was conquering the development world through sheer ubiquity, a programmer and writer named Eric Raymond was thinking about something bigger. Not about the tools themselves, but about the models of development those tools enabled.

In May nineteen ninety-seven, at a conference called the Linux Kongress in Wurzburg, Germany, Raymond presented an essay that would reshape how people thought about software. He called it The Cathedral and the Bazaar.

The central metaphor was simple and devastating. Some software, Raymond argued, is built like a cathedral. A small group of architects designs everything in advance. They work behind closed doors. They release when they decide the work is ready. The code is polished, deliberate, centrally planned. This was how most commercial software was built, and it was how many open source projects operated too. One lead developer, or a small inner circle, controlling the vision and the code.

The alternative was the bazaar. No central plan. Contributors come and go. Code is released early and often, bugs and all. The assumption is that enough eyeballs will find problems faster than any small team of experts could. Raymond credited this insight to Linus Torvalds and called it Linus's Law: given enough eyeballs, all bugs are shallow.

Raymond built his argument around his own experience maintaining a mail utility called fetchmail. He had adopted a bazaar-style development process, releasing constantly, incorporating patches from anyone who sent them, treating users as co-developers. And it worked. The software improved faster than he could have improved it alone. The essay was expanded into a book, published by O'Reilly in nineteen ninety-nine, and its impact went beyond philosophy. Netscape executives reportedly cited the essay as an influence on their decision to release the browser's source code, the move that eventually became Mozilla Firefox.

But here is the part of Raymond's essay that most people miss. The cathedral and the bazaar are not just models of development. They are models of tooling. A centralized version control system like CVS is a cathedral tool. One server. One truth. One point of control. The administrator decides who has access. Changes flow through a single gate. The architecture enforces hierarchy. You cannot participate without the server's blessing.

A distributed version control system, the kind that did not exist yet in nineteen ninety-seven but was coming, is a bazaar tool. Every copy is complete. Every developer is autonomous. There is no single gate, no bottleneck, no hierarchy baked into the architecture. The social structure of the project can be anything because the tool does not impose one.

Raymond was describing a cultural shift. But the tools had not caught up yet. In nineteen ninety-seven, the bazaar was being built with a cathedral tool. Linux was the most bazaar-like project in the world, and its developers were emailing patches to each other because CVS could not handle their workflow. That tension, between how people wanted to work and what their tools allowed, would simmer for eight more years before it exploded.

A Better Cathedral

By the late nineteen nineties, the frustrations with CVS had crystallized into a wish list that every developer seemed to share. Atomic commits. Proper directory versioning. File renames that preserved history. Better branching and merging. Everyone knew what was broken. The question was who would fix it.

In nineteen ninety-five, two developers named Karl Fogel and Jim Blandy started a company called Cyclic Software. Their business was unusual for the time: they offered commercial support contracts for CVS. If your company depended on CVS and something went wrong, you called Cyclic. This meant Fogel and Blandy spent their days elbow-deep in CVS's worst problems, filing and fixing bugs, hearing every horror story, understanding every limitation at a level that most users never had to.

Blandy had been thinking for years about what a replacement would look like. He already had a name picked out: Subversion. Not a revolution, a subversion. Take everything CVS did right, the copy-modify-merge model, the central repository, the familiar workflow, and fix what it did wrong. A better CVS.

Then, in February two thousand, a company called CollabNet came calling. CollabNet made collaborative software tools and was looking for someone to build a proper version control system for their platform. They were using CVS internally and it was, according to the Subversion project history, obviously inadequate from the beginning. They contacted Fogel, who had just written a book called Open Source Development with CVS, published by Coriolis in nineteen ninety-nine.

The timing was serendipitous. At the exact moment CollabNet called, Fogel was already in conversations with Blandy about designing that replacement.

It turned out that many people had encountered the same frustrating experiences with CVS and welcomed the chance to finally do something about it.

Fogel said yes immediately. Blandy's employer, Red Hat Software, donated his time to the project for an indefinite period. CollabNet hired Fogel and another developer named Ben Collins-Sussman. Detailed design work began in May two thousand. The open source community responded with enthusiasm. Everyone, it seemed, had been waiting for someone to finally fix CVS.

Fogel and Blandy knew exactly what was wrong. They had been living with CVS for five years, running a support company for it. The goal was not to reinvent version control. It was to build the tool CVS should have been all along.

Subversion delivered on the wish list. No more partial commits, if the network died mid-upload, the whole change rolled back, not just half. Renaming a file? The history stayed intact. Empty directories? Now they could be tracked. Branching was still slow, but at least it did not require manual repository surgery. For teams that had spent years cursing CVS, Subversion felt like a miracle. But it was still a cathedral. And cathedrals, no matter how grand, have one door.

The Ceiling

Subversion launched in two thousand four and quickly began displacing CVS. Projects migrated. SourceForge added Subversion support. Apache moved. The Python project moved. By two thousand eight, the Apache Software Foundation reported that over seventy percent of its projects had migrated from CVS to Subversion. It was clearly, measurably better than what it replaced.

But Subversion kept the centralized architecture. One server. One repository. One point of failure. If the server was down, you could not commit. You could not browse history. You could not create a branch. You had the files on your laptop, and that was it. All the intelligence lived somewhere else.

For most projects, this was fine. A team of twenty developers in one office with a reliable server rarely noticed the limitations. But for the projects that were pushing the boundaries of open source collaboration, the ceiling was getting closer.

The Linux kernel had thousands of contributors scattered across the globe. Mozilla had hundreds. These projects needed developers to work offline, to experiment freely on branches without asking permission from a central server, to merge work from dozens of parallel streams without the server becoming a bottleneck. Subversion fixed CVS's bugs, but it did not fix CVS's fundamental assumption: that version control needs a center.

Remember Raymond's essay? The bazaar model was winning. Open source was exploding. Projects were growing faster, spreading wider, accepting contributions from more people in more places than anyone had imagined. And the tools were still cathedrals. Still centralized. Still built around the assumption that one server could be the single source of truth for a project with thousands of contributors on six continents.

The Day the Server Died

Imagine it is two thousand three. You are a software developer at a mid-sized company, maybe forty engineers, working on a product that ships to real customers. All of your source code lives on one machine. A single server in a closet somewhere, probably running FreeBSD or Red Hat, humming away behind a locked door that someone lost the key to years ago. That server holds every line of code your company has ever written. Every version, every change, every branch. Fifteen years of institutional knowledge, sitting on one hard drive.

And one morning you walk in, coffee in hand, and the server is dead. Not slow. Not glitchy. Dead. The hard drive has failed. The backup, which nobody has tested since the previous system administrator left, turns out to be three months old. Three months of work by forty people, gone. Not just the current code, you can piece that together from people's laptops. The history is gone. Who changed what, when, and why. The branches people were working on. The release tags. The audit trail. All of it, vanished, because the entire world lived on one machine, and that machine stopped spinning.

This was not hypothetical. This was Tuesday for teams that built their lives around CVS, and later, Subversion. The cathedral model had a single, catastrophic flaw. It put all the eggs in one basket, and the basket was a server in a closet.

As we mentioned last episode, in two thousand two Linus Torvalds looked at this situation and made a decision that horrified half the open source community. He adopted BitKeeper, a proprietary, closed-source version control system, for the Linux kernel. BitKeeper was different. It was distributed. Every developer had a complete copy of the entire history. You could commit offline. You could branch for free. You could merge with the confidence of a system that actually tracked what had been merged and what had not.

BitKeeper showed the open source world what was possible beyond the cathedral. But it came with strings attached. Proprietary strings. And in April two thousand five, those strings were pulled.

That is where the next episode picks up. The crisis that killed the cathedral model for good, and the two weeks that changed software forever.

The word that defined an era of version control was checkout. cvs checkout, then the project name. svn checkout, then a URL. That word tells you everything about the model. You are checking something out, like a library book. The real copy lives on the shelf. You get a temporary version. When you are done, you check it back in. If someone else wants the same book, they wait, or they get a copy that might be out of date. The entire philosophy of centralized version control is captured in that one word. You do not own the code. You borrow it. And the server decides when you can have it back.