The Copy-Paste Catastrophe

The Folder of Shame

You know that moment when you are working on something important and you think "I should probably save a version of this before I break everything"? So you do what every reasonable person does. You save a copy. Project final dot doc.

But then you realize you need to make another change. So you save another copy. Project final version two. And then your colleague sends you edits. Project final version two, John's edits. And you merge those in but want to keep the old version just in case. Project final version three with John's stuff.

And then a week later you are looking at a folder with project final, project final version two, project final version two John's edits, project final version three with John's stuff fixed, project version four, project actually final, project actually final revised, project use this one, and project no seriously use this one.

And you have absolutely no idea which file has which changes, when they were made, why you made them, or which one you are supposed to actually use.

This is the copy-paste catastrophe. And if you have never experienced it, congratulations, you are either lying or you started your career after two thousand ten. For the rest of us, this was life. Every creative and technical field dealt with some version of this chaos, from novelists to architects to accountants. But nowhere did it hurt worse than in software, where a single wrong character in a single wrong file could crash an airplane, drain a bank account, or shut down a power grid. When the stakes are that high, "I think version four is the right one" is not a comforting answer.

This is the first episode of Git Good, a twenty-two episode journey through the history of version control. We are going to follow this problem from filing cabinets and magnetic tape all the way to a tool called Git that now tracks nearly every piece of software on Earth. Along the way we will meet the people who built these tools, the fights they had, the disasters that forced their hand, and the philosophical arguments that still divide the programming world today. It starts here, with a folder full of files nobody can tell apart.

The Code Librarian

Let us go back to the nineteen sixties and seventies. Programmers were building increasingly complex software, and the copy-paste problem was not just annoying. It was genuinely dangerous. Imagine you are working on software that controls a nuclear power plant. You and five other engineers are all editing the same codebase. How do you know who changed what? How do you prevent two people from editing the same file at the same time and overwriting each other's work? How do you go back to yesterday's version when today's changes broke everything? How do you keep track of which version is actually running in production?

The earliest solutions were not software at all. They were people. In nineteen sixty-nine, a company called Applied Data Research released a product called The Librarian for IBM mainframes. Its job was to manage punched card decks, the physical stacks of paper cards that represented a program. Each card had one line of code punched into it, and a complete program might be thousands of cards. Drop the deck and you had a very bad afternoon. The Librarian kept track of which deck was which, who had checked it out, and what version it represented.

The name was not a metaphor. In many organizations, a literal human being served as the code librarian. One engineer would walk up to the librarian and say "I need to work on the login module." The librarian would check the log, see that nobody else had checked it out, hand over a physical printout or magnetic tape, and write down the time and name. When you were done, you physically returned it. If someone else wanted to work on the same code, they had to wait. No parallel work allowed.

There is a detail about Applied Data Research that captures what this era felt like. In nineteen seventy, a fire destroyed their headquarters. The company's highest priority during the evacuation was saving the Librarian tapes. Not the furniture, not the filing cabinets. The magnetic tapes that held the code. Because if those tapes burned, the software they tracked was gone forever. There was no backup in some distant data center, no cloud, no redundant copy. The tapes were the only record.

This worked about as well as it sounds. Projects moved slowly. Experimentation was risky. Making a backup before trying something new meant another tape, another label, another entry in the logbook. The friction was enormous, and it was growing. Software was getting bigger, teams were getting larger, and the gap between what programmers wanted to build and what their tools allowed them to manage was widening every year.

Walter Tichy and the Difference Engine

In the nineteen eighties, software finally started eating its own dog food. Programmers realized they could use software to solve the version control problem.

The first tools were primitive. Bell Labs had released something called the Source Code Control System in nineteen seventy-two, but it was clunky and limited. The real breakthrough came a decade later. In nineteen eighty-two, Walter Tichy at Purdue University created the Revision Control System. His approach was brilliant in its simplicity. Instead of saving entire copies of files, it saved the differences between versions. Your first version of a file got stored completely. Your second version? Only the changes got saved. Third version? Just the changes from the second.

This was revolutionary. Instead of ten copies of a ten thousand line file eating up your hard drive, you had one copy plus nine small diff files. Storage was expensive in nineteen eighty-two. A ten megabyte hard drive cost over three thousand dollars. Saving space was not elegance, it was survival.

Why store ten copies of the same file when only a few lines changed? Store the original, then store the differences. That is all you need.

But Tichy's system had a blind spot. It tracked files, not projects. If you wanted to know "what did my entire project look like on July fifteenth," you had to manually check out the right version of every single file. For a project with a hundred files, this was tedious. For a project with a thousand files, it was practically impossible. Tichy had solved the storage problem, but the coordination problem remained. You could see the trees, but you had lost sight of the forest.

One Server to Rule Them All

Then came CVS, the Concurrent Versions System. The original scripts were written in the mid nineteen eighties by a professor named Dick Grune in Amsterdam, but the version that conquered the world was a C rewrite that reached version one point zero in nineteen ninety. CVS wrapped Tichy's system with a layer that tracked entire projects. Now you could say "give me the state of everything on July fifteenth" and get it.

CVS also introduced the idea of a central repository. One canonical place where the real code lived. One server, one source of truth. But that safety came with a cost. Every morning, you would log in, cross your fingers, and run your update. If the server was down, you were stuck. If the network was slow, you would watch the progress crawl. And if someone else had edited the same file? You had just inherited a merge conflict that could take hours to untangle. The server was your lifeline, but it was also your bottleneck.

Centralization was not just a technical model. It was a philosophy of control. The server was not a convenience, it was the cathedral. It enforced a single, official history. For managers, it meant oversight. For teams, it meant order. You always knew where the truth was: in the closet, on the server. But the cathedral had one door.

CVS dominated through the nineteen nineties and into the early two thousands. By the year two thousand, it was the default for the vast majority of open source projects, from Apache to GNOME. Universities taught it. Companies mandated it. For an entire generation of developers, version control meant one thing: a central server in a closet somewhere, humming away, holding the truth.

But branching, while technically possible, was painful enough that most teams avoided it entirely. Creating a branch in CVS meant copying the entire project directory on the server. It was slow, it ate disk space, and merging branches back together was notorious for being agonizing and error-prone.

So teams worked directly on the main codebase, trying not to step on each other's toes. Or they would create a branch for a big feature, and some poor developer would spend days manually merging changes back, resolving conflicts by hand, praying they did not break anything. The tools that were supposed to enable creativity had become a constraint on it. Experimentation was expensive. "Want to try a radical new approach? Better be sure it will work, because branching and merging is going to hurt." That was the unspoken rule of the centralized era.

Dick Grune's story, and how CVS went from a few shell scripts to the backbone of professional software development worldwide, is where we are headed next episode. But first, there is one more piece of this puzzle. Because the centralized model was not just inconvenient. For one particular project, it was becoming an existential crisis.

The Ceiling Nobody Saw Coming

Meanwhile, something extraordinary was happening in the open source community. From nineteen ninety-one, when Linus Torvalds first posted his hobby operating system to a newsgroup, the Linux kernel had grown into the most actively developed open source project in the world. Hundreds, then thousands of developers worldwide, contributing patches, experimenting with ideas, managing releases across different time zones and continents. And their version control system for all of this was Linus Torvalds's email inbox.

Yes, really. Until two thousand two, the most important open source project in the world was managed with patches sent via email and one person's willingness to read them all day, every day. Developers would write their changes, format them as text diffs, and email them to the Linux Kernel Mailing List. Linus would apply them to his copy of the source tree. If your patch got lost in the flood, that was your problem.

CVS existed. Subversion existed. Linus rejected both. They were centralized, they were slow at the kernel's scale, and their branching was too painful for a project with thousands of parallel development streams. The centralized model had a ceiling, and the kernel had smashed into it.

In two thousand two, a man named Larry McVoy offered Linus a way out. McVoy was not an outsider. He was a kernel developer himself, had worked on Linux since the early nineties, and he ran a company called BitMover that made a version control system called BitKeeper. BitKeeper was fundamentally different from CVS. It was distributed. Every developer got a complete copy of the entire repository. You could commit offline, branch freely, experiment wildly, and merge with confidence. It was fast. It was powerful.

But it was not free, and it was not open source. McVoy offered free licenses to open source projects, and Linus, ever the pragmatist, accepted. The Linux kernel, the crown jewel of the free software movement, would be developed using a proprietary tool. Many kernel developers were furious. Alan Cox, one of the most respected contributors, refused to use BitKeeper on principle. Others felt the same. The philosophical tension between using the best available tool and supporting only open source software had been simmering in the community for years. BitKeeper brought it to a boil. It was the first major skirmish in a war that would play out again and again: who gets to own the tools everyone depends on?

The majority of this problem is an open source community problem. They simply do not want to play with non-open source. At least some of them do not, and they ruin it for the rest of us.

That was Larry McVoy, and he was not entirely wrong. But neither were the developers who saw a trap forming. The free license came with conditions. One of those conditions was that you could not work on any competing version control system while using BitKeeper, or for a year afterward. The kernel community was building its entire workflow on a tool controlled by someone else, under terms that could change at any moment.

For three years, it worked. BitKeeper transformed kernel development. Merging got faster. Branches became cheap. Linus could process patches at a rate that had been physically impossible before. The tool was genuinely excellent.

So far it is a gray and bleak world.

That was Linus Torvalds in April two thousand five, the month it all fell apart. A developer named Andrew Tridgell, the same person who created Samba and rsync, two of the most widely used open source tools in history, had reverse-engineered the BitKeeper protocol. He did it to build an open source tool that could read BitKeeper data. McVoy saw this as a violation of the license terms. On April sixth, two thousand five, BitMover revoked the free licenses.

Suddenly, the most important open source project in the world had no version control system. Thousands of developers. Millions of lines of code. Three years of workflow built around a tool they could no longer use. Everything the skeptics had warned about had come true. The strings attached to proprietary generosity had been pulled.

And that crisis, what happened next, and the tool that emerged from the wreckage, is where this series really begins. Next episode, we go to Amsterdam, to a bus stop, and to the quiet professor whose little shell scripts built the system that ruled the world for fifteen years.

This is Git Good. Twenty-two episodes. The story of how version control conquered the world. And we are just getting started.

Before there was Git, before there was CVS, before there was anything, there was cp. The copy command. cp my-project my-project-backup. That is the entire interface. One command, two names, a fresh copy. It has not changed since nineteen seventy-one. cp does not track what you changed. It does not know why you copied. It does not care that you already have fourteen copies with increasingly desperate names. It just makes another one, no questions asked. Every programmer alive has used cp as version control at some point. It is the oldest tool in this entire story, and it is still running on every machine that will ever run Git. The humblest ancestor. The one Git was built to replace. And honestly, on a bad enough day, the one you still reach for first.