The Migration

The Friday Afternoon Email

This is episode five of Git Good, Season Two. In the last episode, we watched how code review became a ritual and how pull requests became a bottleneck. Now we are going to look at the event that puts every team problem into motion at once. The moment someone decides: we are switching to Git.

Somewhere right now, a manager is composing an email. It says something about modernizing the toolchain. It mentions that everyone uses Git. It allocates two weeks for the transition. And it is about to ruin several people's month.

The email gets sent on a Friday afternoon because the manager wants the team to have the weekend to process the news. What the manager does not understand is that "processing the news" means the senior developer will spend Saturday calculating how many scripts they need to rewrite, the build engineer will spend Sunday trying to figure out what happens to the Jenkins pipeline, and the newest team member will spend both days feeling a strange mixture of relief and dread, because they learned Git in school but the version they learned and the version their company is about to need are not the same thing.

Migration is the most expensive thing a team can do that has no visible output. No new features. No bug fixes. No performance improvements. Just the same code, in a different box, with a team that has temporarily forgotten how to do their jobs. And yet organizations do it constantly, because the cost of not migrating has a name. Social gravity.

The First Migration

In two thousand eight, Google released the Android source code to the public. The codebase was eight and a half million lines, not counting the Linux kernel, and it had been developed using Subversion. To make it work as an open source project, Google needed it in Git. The problem was that a codebase that large, split across that many components, would choke any single Git repository.

So Jeff Bailey and the Android team built two tools that still exist today, almost two decades later. The first was repo, a Python wrapper that manages hundreds of Git repositories through a single XML manifest file. Instead of one impossible repository, Android became a constellation of manageable ones, each holding a subsystem, all coordinated by a manifest that describes how they fit together. The second tool was Gerrit, a code review system built specifically for this multi-repository world.

We do not want our most technical people to spend their time as patch monkeys.

The Android migration was a success, but it was also a warning. Google did not just install Git and push the code. They had to invent infrastructure. They had to build a meta-layer on top of Git because Git itself could not hold the thing. Most organizations migrating to Git do not have Google's resources. They have the same problems at a smaller scale, and no engineering team dedicated to building solutions.

The Bridge

The tool that was supposed to make migration painless is called git-svn. It is built into Git itself, a command that lets you clone a Subversion repository into a Git repository, converting each Subversion revision into a Git commit. In theory, it is the perfect bridge. Some developers switch to Git while others stay on Subversion. The bridge keeps both sides in sync. You can migrate incrementally, one person at a time.

In practice, git-svn is a test of human endurance.

Cody Casterline, an engineer at SmartBear, documented his team's migration and the numbers are grim.

It took around four days for git-svn to convert our repository. Four days of babysitting a process that would leak memory until it coredumped.

And that was for a repository with roughly fifteen thousand revisions, which is modest by any professional standard. The tool does not handle non-standard Subversion layouts well either. If your repository does not follow the classic trunk, branches, and tags directory structure, git-svn gets confused and produces something that resembles your history the way a funhouse mirror resembles your face.

But the real problems are conceptual, not technical. Subversion and Git think about the world differently, and those differences surface in painful ways during translation.

Subversion tracks file renames explicitly. When you run svn move, the server records that a file was renamed, and the entire history follows the file to its new location. Git does not track renames. It detects them after the fact by comparing the content of deleted and added files. If the content is at least fifty percent similar, Git guesses it was a rename. If you renamed a file and also changed most of its contents in the same commit, Git sees a deletion and an unrelated addition. The history splits. The older history, under the old filename, is only visible if you pass a special flag, and even then Git can get it wrong.

For a team whose Subversion repository has fifteen years of carefully maintained file history, watching that history fragment during migration is not a technical inconvenience. It feels like a loss.

Then there are the things that have no equivalent at all. Subversion has sequential revision numbers. Revision one thousand forty-two means the one thousand forty-second change, globally, across all files and branches. You can say "the bug was introduced in revision eight hundred" and everyone knows exactly where that is in the timeline. Git has commit hashes, forty characters of hexadecimal that are unique but meaningless to humans. You cannot look at a hash and know whether it came before or after another one. The shared clock disappears, replaced by a graph that only a computer can read.

Subversion has directory-level permissions. You can restrict who can commit to the billing directory without restricting access to the rest of the codebase. Git has no directory-level permissions. Access control happens at the repository level. If you need fine-grained control, you split into multiple repositories, which creates a different set of problems.

And Subversion has one workflow. Update, change, commit. The server is truth. Git has a dozen workflows, and the one your team picks will shape everything. Do you merge or rebase? Feature branches or trunk? Squash before merging? Each choice has consequences, each choice sparks arguments, and a team migrating from Subversion needs to reach consensus on questions that did not exist in their previous system.

The Human Side

The hardest part of any migration is not the technology. It is the person who does not want to migrate.

Every team has one. The senior developer who has been writing software for twenty years and has used Subversion for most of them. They know every svn command by heart. They know the revision numbers of important changes the way you know phone numbers you have dialed a thousand times. They have scripts, aliases, muscle memory built over years. They are not resistant to change because they are stubborn. They are resistant because they have invested deeply in proficiency with a tool, and the migration asks them to become a beginner again. In an industry that measures people by their expertise, being a beginner is uncomfortable at any age.

The manager who mandated the migration is the other recurring character. They read that Microsoft and Google use Git, which is only partially true. They announce the switch in a quarterly planning meeting. They allocate two weeks. They do not budget for training. They do not account for the productivity dip. They do not realize that "everyone uses Git" means "everyone uses Git differently" and that their team will need to agree on a workflow nobody has experience with yet.

The organizations that handle migration well treat it as a project, not an event. They budget months. They run old and new systems in parallel. They train people before the cutover. They accept that productivity will drop for weeks after the switch.

The organizations that handle it badly discover the worst outcome, which is not a failed migration. It is a half-finished one.

The Half-Migration

The repository is converted but nobody has updated the build scripts. The history is there but nobody knows how to search it with Git instead of Subversion. Half the team has switched and the other half is still using git-svn as a bridge, creating a translation layer that adds confusion and merge headaches. The organization is paying the cost of two systems while getting the benefits of neither.

And sometimes, quietly, the team goes back. Not to Subversion usually. But to the workflow they had before. They use Git the way they used Subversion. One branch. Linear commits. Push and pull from a central server. No feature branches. No pull requests. No stashing, no rebasing, no interactive history editing. They have Git installed but they have not actually migrated. They have changed which command they type and nothing else.

This is the half-migration, and it is far more common than anyone admits. Surveys consistently show that most developers use fewer than ten Git commands regularly. The five-command crowd from episode two, the ones who know init, add, commit, push, and not much else, they are everywhere, and many of them arrived at those five commands through a migration that promised much more.

The Gap

Here is the thing nobody tells you in the conference talks about migration. The companies that make migration sound heroic are the ones at the extremes. At one end, the startup with twelve people and a year of Subversion history. Their migration takes a weekend. At the other end, companies that threw hundreds of engineers and years of effort at making Git work at scales it was never designed for. Season one told that engineering story.

But most companies are not at either extreme. They are in the middle. Fifty engineers. Five hundred thousand lines of code. Ten years of history. A monorepo that is starting to feel slow but is nowhere near the size that would justify building custom infrastructure. They are in the gap.

The gap is where the monorepo question lives, and it is not actually a technical question. It is an organizational one.

A monorepo says: we are one team. Boundaries between groups are soft. Any engineer can see any code. When one team changes an interface, they fix every caller. Coordination happens through the code itself. The cost is tooling. The cost is that when the build breaks, everyone is broken.

A polyrepo says: we are many teams. Each group owns their territory. You publish versioned interfaces and other teams consume them when ready. Some upgrade immediately. Some upgrade next quarter. Some never upgrade, and you end up with six versions of the same library running in production. The diamond dependency problem, where two libraries need different versions of the same third library, becomes a daily reality rather than a textbook example.

The companies at the extreme ends of the scale have already made this choice and invested heavily in making it work. The company in the gap has to choose between a monorepo that will slowly degrade as they grow and a polyrepo that will slowly fragment as they grow. Both are correct. Both are painful. The question is which kind of pain matches your organization.

And here is what makes the gap treacherous. Git does not care which you pick. Git will store one monorepo or a thousand polyrepos. It was designed for the Linux kernel, which is one of the most successful monorepos in the world, managed not by custom infrastructure but by a mailing list, some scripts, and the judgment of one maintainer. That model works for Linux. It does not scale to a company of five hundred where the problem is not code management but team coordination.

The Quiet Upstream Battle

There is a human story inside this gap that is easy to miss. When large companies hit Git's limits, they do not just build internal tools. Some of them try to fix Git itself. And that means their patches land on the desk of Junio Hamano, Git's lead maintainer since two thousand five, who has spent two decades managing the tension between Git's origins and its enterprise ambitions.

When Microsoft showed up with patches to make Git handle three hundred gigabytes, those patches added complexity to a tool that prided itself on simplicity. Derrick Stolee, who led the effort to evolve Microsoft's original virtual filesystem approach into something Git could absorb, described the philosophy as preferring incremental changes over complete rewrites.

Each individual movement was relatively small compared to the entire system.

It took years. The partial clone patches went through extensive review. The sparse checkout redesign went through multiple iterations. The filesystem monitor integration, which lets Git ask the operating system what changed instead of checking every file, was controversial because it added a new dependency to a tool that had survived on minimal dependencies.

Each of these changes makes Git better for large repositories. Each one also makes Git more complex. The tool that started as a few thousand lines of C now has features that exist solely because companies with tens of thousands of engineers need them. Whether that is growth or bloat depends entirely on which end of the gap you sit at. If you are the twelve-person startup, you will never use sparse checkout. If you are the five-hundred-person company whose git status takes thirty seconds, it is the difference between staying on Git and starting to look elsewhere.

Social Gravity

In April two thousand nineteen, the Apache Software Foundation completed its migration to GitHub. Apache had been one of the last major open source organizations still running its own Subversion infrastructure. Hundreds of projects, decades of history, thousands of contributors.

The migration was not driven by Git's technical superiority. It was driven by the fact that contributors expected Git and GitHub. Pull requests had become the universal language of open source contribution. A project hosted on Subversion was a project that was harder to contribute to, harder to discover, harder to integrate with the tools everyone else was using.

This is social gravity, and it is the force that drives most migrations. Not technical merit. Not performance benchmarks. Not feature comparisons. Gravity. Git is where the developers are. GitHub is where the pull requests are. A team that stays on Subversion is not wrong, but they are increasingly alone. Their job postings say "experience with Git" because candidates expect it. Their new hires arrive already knowing Git, or at least knowing the five-command incantation. The cost of not migrating is measured in recruitment friction, in contributor attrition, in the slow drift toward isolation.

The story of Git's rise usually sounds like technical merit. The right tool winning because it was the right tool. But there is a different story underneath. Adoption is also about gravity. About network effects. About the cost of being different in an industry that standardized whether you were ready or not.

The Wrapper Question

There is a question forming around migration now that did not exist five years ago. If an AI assistant handles the Git commands, resolves the merge conflicts, manages the branches, does a migrating team need to understand Git at all?

Think about that half-migrated team from a few minutes ago. The one using Git like Subversion. One branch, linear commits, none of the things that make Git worth the migration cost. What if that is fine now? What if the AI layer makes the underlying tool irrelevant? The team gets Git compatibility, which is what the social gravity demands. They get the green checkmark on the compliance form. And they never have to learn the difference between merge and rebase because something else handles it.

If that is the future, then migration is not adoption. It is a format change. The cost is real but the benefit is not Git's power. It is Git's compatibility. You migrate not because Git is better for what you do, but because Git is what everything else expects. The tool becomes invisible, a storage layer underneath whatever interface actually runs the show.

The great migration is not a single event. It is an ongoing process, happening right now, in thousands of organizations. Some of them are doing it well. Some of them are doing it badly. Some of them are staring at a Subversion server, knowing they need to switch, not because Git is better for what they do, but because Git is what everyone else uses.

That is the tax. And everyone pays it, one way or another. That was episode five of Git Good, Season Two.

Git svn clone is the bridge between two worlds. Point it at a Subversion repository and it will rebuild the entire history as Git commits, one revision at a time. The branches come along, the tags come along, and when it finishes you have a Git repository that remembers where it came from. You can push changes back to the Subversion server with git svn dcommit, which is how teams migrate incrementally, one developer at a time. What it will not tell you is that the initial clone can take days for a large repository, that merge history rarely survives the translation, and that file renames tracked explicitly in Subversion become heuristic guesses in Git. The bridge works. It is also the place where you first notice that these two systems do not think the same way.