Git Beyond Code: The World Forks Back

The Novelist Who Branched Her Ending

Somewhere around two thousand seventeen, a fiction writer named Jessi Shakarian sat down to solve a problem that had nothing to do with software. She had a novel with two possible endings. Not a vague sense that the story could go either way, but two fully drafted final chapters, each taking the protagonist in a different direction, and she needed to develop both without losing either. She had been a fiction editor for years. She knew the usual approach. You save a copy. You rename it. You end up with a folder full of files called final dash two, final dash real, final dash USE THIS ONE, and eventually you lose track of which draft contains the paragraph you rewrote at three in the morning that was actually good.

Then someone showed her Git. And the idea of a branch, a parallel timeline where you can develop an alternate version of your work without touching the original, clicked instantly. Not because she understood the underlying data model. Not because she cared about directed acyclic graphs or content-addressable storage. It clicked because branching is what novelists already do in their heads. They just never had a tool that made it concrete.

The friction, of course, is everything else. Git was built by Linus Torvalds for kernel developers, and its vocabulary shows. Commit. Merge. Rebase. Checkout. Stash. These words mean something very specific to programmers and absolutely nothing to someone whose tools are Scrivener and a cup of tea. The command line is a wall. The error messages read like threats. And the mental model, that you are not saving a file but recording a snapshot of your entire project at a moment in time, requires a conceptual leap that most writing guides never attempt.

But the writers who make that leap tend to become evangelists. Chris Rosser, an Australian novelist, wrote about his workflow of drafting in Markdown files tracked by Git, each chapter its own file, each revision a commit with a message describing what changed. He could see exactly when he rewrote the opening of chapter seven. He could compare Tuesday's draft to Friday's. He could go back to the version from three weeks ago where the dialogue was sharper and pull just that scene forward. D. Moonfire, a programmer who has been writing fiction for decades, built an entire framework for managing novels with Git, complete with metadata in YAML headers and automated builds that compile Markdown chapters into ebooks and print-ready PDFs.

These are still edge cases. Most novelists will never touch a terminal. But the underlying idea, that creative work deserves the same rigorous version history that code gets, has started leaking into tools designed for normal humans. Google Docs has version history. Notion has page history. Every modern writing app now offers some pale reflection of what Git does natively. The irony is that Git, the tool nobody outside programming was supposed to care about, quietly defined what "undo" should really mean.

The Lab Notebook That Never Lies

In a research lab at the University of Chicago, a postdoctoral scholar named John Blischak had a familiar problem. He was writing R code to generate figures for a grant application, and every time he tweaked a parameter or tried a different statistical approach, he needed to keep the previous version. His solution, before Git, was what every scientist does. He created new files. Analysis dot R became analysis two dot R became analysis final dot R became analysis final two dot R.

I just keep overwriting it and changing it and saving the snapshots. That is what Git gives you. You do not need the numbered copies anymore.

This sounds like a convenience story, and it is, but underneath it is something much bigger. Science has a reproducibility crisis. Studies that cannot be replicated. Results that vanish when someone else tries to run the same analysis. And one of the reasons is that the code behind published research has historically been treated as disposable, a means to an end that gets lost on a graduate student's laptop after the paper is accepted. Git changes that equation. When your analysis lives in a version-controlled repository, every step is recorded. Every decision is traceable. The reviewer can see not just the final result but the path you took to get there.

Juan Antonio Vizcaino, who leads the proteomics team at the European Bioinformatics Institute in Cambridge, has been pushing for exactly this. Make the code available. Make the data available. Put it in a repository where anyone can inspect the pipeline from raw data to published figure. Titus Brown, a bioinformatician at the University of California Davis, runs his entire lab on GitHub. Collaborations, access control, automated quality checks on submitted code, canonical versions of every analysis pipeline. When forty researchers across multiple institutions collaborated on what they called "the deep review," a massive survey of artificial intelligence in medicine led by Casey Greene at the University of Pennsylvania and Anthony Gitter at the University of Wisconsin, they managed the entire paper on GitHub. Forty people. One repository. Every edit tracked, every contribution visible, every disagreement preserved in the issue tracker.

Journals have started to notice. The Journal of Open Source Software runs its entire peer review process through GitHub Issues. The rOpenSci community requires that submitted packages be hosted on GitHub, where reviewers can file issues, suggest changes, and track the full history of revisions. A twenty twenty-four survey published in Frontiers in Computer Science found that reproducibility policies across scientific journals are tightening steadily, with more and more requiring that analysis code be deposited in public repositories.

But Git has a fundamental limitation that science keeps bumping into. It was designed for text. Source code. Configuration files. Small, line-based documents where a diff shows you exactly what changed. Science deals in datasets. Genome sequences. Satellite imagery. Neural network weights. Files measured in gigabytes, sometimes terabytes. Git chokes on these. Every version of a large binary file gets stored in full, bloating the repository until it becomes unusable. Git Large File Storage, or LFS, appeared in two thousand fifteen as a partial fix, replacing large files with lightweight pointers while storing the actual data on a separate server. But LFS has its own limits. It is tied to a single hosting provider. GitHub caps total repository size at two gigabytes. And it offers nothing for the operations scientists actually need, like splitting datasets, tracking transformations, or comparing experiment results.

That gap gave rise to DVC, Data Version Control, a tool built specifically for data scientists who wanted Git's philosophy applied to their world. DVC tracks large files, manages experiment pipelines, and works with any storage backend, Amazon S three, Google Cloud, Azure, or just a network drive down the hall. It is Git for everything Git cannot handle. The fact that it exists at all tells you something about how far Git's ideas have spread. Scientists did not adopt Git because someone told them to. They adopted it because the alternative, a folder full of files named results final real this one dot csv, was no longer acceptable.

And now there is a new wrinkle. AI-generated text has started appearing in Git repositories alongside the human-written kind, and nobody is quite sure what that means for authorship. Novelists using language model assistants commit drafts where some sentences were written by a person and some were suggested by a machine, and the commit log attributes it all to one author. Scientists working with large language models track the model version, the prompt, and the output alongside their analysis code, trying to make the whole pipeline reproducible. If you run the same prompt against a different model version, you get different output. So the model itself becomes a dependency, logged in a requirements file like any other library.

What Git cannot easily capture is the question of who actually wrote the thing. When a bioinformatician commits a paragraph of discussion text and half of it came from a language model, the git blame command has no way to express that ambiguity. The metadata says one author. The reality is more complicated. Researchers have started proposing conventions, commit message tags, machine-readable headers, that would let authors declare AI involvement at the commit level. Whether those conventions will stick is an open question. But the problem they are solving is new. For the first time, Git's authorship model is being asked to track content whose origins are genuinely unclear.

Diffing a Law

In late two thousand twelve, a German developer named Stefan Wehrmeyer had an observation that seems obvious in hindsight. Legislation and source code, he noticed, share a surprising structural resemblance. Both are large bodies of text spread across multiple units. Both grow incrementally over time. Both change through a process of proposal, review, and approval. And both suffer from the same fundamental problem, that understanding what changed between version A and version B requires careful, line-by-line comparison.

Both are big bodies of text spread over multiple units. They grow incrementally over time. If software developers have tools that track every change to every line, why should lawmakers not have the same?

So Wehrmeyer built Bundes-Git. He took every German federal law and regulation, converted them to Markdown, the plain text format that programmers use for documentation, and committed them to a Git repository on GitHub. The result was extraordinary. Every amendment to every law became a commit. Every change was visible as a diff, the red and green highlighting that shows exactly which words were added, removed, or modified. You could look at a tax law and see not just what it says today, but the precise moment a clause was inserted, which session it was voted on, and what it replaced.

The project appeared on GitHub under the Bundestag organisation. Konstantin Käfer designed a mascot, an octo-eagle, half GitHub's octocat and half the German federal eagle. It was covered by Wired, Heise, and Golem. Hacker News threads filled with lawyers and developers debating whether this could work at scale. Wehrmeyer was clear about one thing. Only valid legislation, actually voted on by the Bundestag, would be merged into the main branch. This was not a wiki where anyone could propose changes. It was a mirror, a read-only record of democratic output, made transparent through a tool that was never designed for democracy.

The deeper idea, that laws should be diffable, has a seductive logic. Citizens could subscribe to a repository and get notifications when a law they care about changes. Journalists could compare draft legislation to the final version and see exactly what was negotiated away in committee. Legal scholars could trace the evolution of a concept across decades of amendments. The version history of a nation's laws, which was previously buried in parliamentary records and gazette archives, would be as accessible as checking git log.

Wehrmeyer chose Markdown over XML deliberately. XML would have been more structured, more machine-readable. But diffs of XML are unreadable to humans, a wall of angle brackets and closing tags where the actual change, the three words that were added to a clause, drowns in formatting noise. Markdown diffs are clean. You see the words. You see what changed. The tool serves the citizen, not the database.

The Vault Under the Mountain

On February second, two thousand twenty, GitHub took a snapshot. Every active public repository on the platform. Every open-source project with at least one star and recent activity. Every repository with two hundred and fifty or more stars, regardless of when it was last touched. Twenty-one terabytes of human knowledge, compressed, QR-encoded, and written to one hundred and eighty-six reels of archival film by a Norwegian company called Piql.

Then they buried it under a mountain in Svalbard.

The Arctic Code Vault sits two hundred and fifty metres deep inside a decommissioned coal mine in the Arctic World Archive, about a mile from the Global Seed Vault, where the world stores backup copies of crop seeds against the possibility of agricultural collapse. Svalbard was chosen for the same reasons it was chosen for seeds. It is one of the most geopolitically stable places on Earth, regulated by the Svalbard Treaty as a demilitarised zone. The permafrost keeps the mine naturally cold. The remoteness keeps it naturally safe. And the symbolism, a vault under a mountain on a frozen archipelago near the North Pole, is hard to beat.

The film is designed to last a thousand years. Not a hundred. Not two hundred. A thousand. The technology is deliberately simple, silver halide on polyester, readable with a magnifying glass and a light source if the QR codes prove too damaged for machines. No proprietary format. No dependency on a specific reader. Just light through film, the same principle that has worked since the eighteen hundreds.

Developers whose code was included in the vault received an Arctic Code Vault badge on their GitHub profiles. Millions of them. Most had no idea their weekend project, their half-finished todo app, their collection of dotfiles, was now preserved in a mine in the high Arctic alongside the Linux kernel and TensorFlow. The badge became a conversation piece. People joked about their code being immortal. But underneath the jokes was something genuinely strange, the idea that Git, a tool invented to track patches to an operating system, had become the vehicle for one of the largest preservation efforts in human history.

The Arctic Code Vault is part of a broader GitHub Archive Program that includes partnerships with the Long Now Foundation, the Internet Archive, the Software Heritage Foundation, and Microsoft Research's Project Silica, which encodes data in quartz glass. The ambition is civilisational. If something goes catastrophically wrong, if knowledge is lost, if the internet fractures, the vault is there. A time capsule measured in terabytes, stored in a format that requires nothing more sophisticated than light.

The Taco That Got Forked

Not everything in Git is serious. Some of the most charming repositories on GitHub contain no code at all, and that is entirely the point.

Take tacofancy. A repository created by a developer named Mike Sinker, it describes itself as a community-driven taco repo. Inside are Markdown files containing taco recipes. Base layers. Seasonings. Toppings. Salsas. Each recipe is a file. Each improvement is a pull request. Someone forks the repository, adds their grandmother's guacamole recipe, and submits it for review. The maintainer merges it or requests changes. The entire social machinery of open-source software, the fork, the branch, the pull request, the code review, applied to the question of how much cumin goes in the carnitas.

I submitted a pull request for a chipotle crema recipe. Someone left a review comment saying the lime ratio was off. It felt exactly like a code review, except the merge conflict was about cilantro.

Then there is the poetry. Taeyoon Choi, an artist and educator, maintains a repository of version-controlled poems. Each revision is a commit. You can diff a poem and see that the word "luminous" was replaced with "incandescent" on a Wednesday afternoon in November. The history of revision, which poets usually keep private in crossed-out notebooks and crumpled drafts, is laid bare in the commit log.

Buster Benson, a product manager, created a repository called "public" containing his entire belief system. A book of beliefs, version-controlled, with a full history of how his thinking changed over time. When he changed his mind about something, he committed the update. The diff showed exactly what he used to believe and what he believes now. It is autobiography as version history.

And then, inevitably, someone put the Bible on GitHub. Mark Jaquith, a WordPress core developer, created a repository called BetterBible with the tagline "The Bible has some issues. Let's make it better." The idea, half joke and half provocation, that sacred text could be improved through pull requests, captured something essential about what happens when you give people a tool designed for collaboration and tell them they can use it for anything.

Mimi Onuoha created a repository about missing datasets, a collection documenting the data that does not exist but should. The gaps in what we measure. The populations that are not counted. The phenomena that nobody tracks. It is social criticism in repository form, using the structure of a data project to highlight the absence of data. The README is the argument. The empty folders are the evidence.

These projects are playful, but they are not trivial. They demonstrate something that Linus Torvalds never intended when he spent ten days in two thousand five building a tool to manage Linux patches. Git's model, track changes, enable collaboration, preserve history, is not specific to code. It is specific to anything that can be written down. And once people realised that, they started writing down everything.

The Programmer's Tool and the Rest of the World

There is a pattern in all of these stories. A novelist discovers branching. A scientist discovers reproducibility. A lawyer discovers diffs. A chef discovers pull requests. A poet discovers commit messages. Each of them arrived at Git not because someone told them it was the right tool, but because the problem they were trying to solve, tracking changes, collaborating without overwriting each other's work, preserving history, turned out to be universal.

The friction is also universal. Git's interface was designed by and for people who think in abstractions. The staging area, the distinction between tracked and untracked files, the three-state model of modified, staged, committed, these concepts map naturally to a programmer's mental model and unnaturally to everyone else's. When a novelist hears "commit your changes," they hear ceremony. When a scientist hears "resolve the merge conflict," they hear trouble. The vocabulary is a barrier, and no amount of graphical interfaces has fully solved it, because the underlying concepts are genuinely complex. The education problem and the hostile error messages that confronted programmers in Git's early years, the same walls described in the first episodes of this series, hit non-programmers twice as hard. At least developers have Stack Overflow. Novelists and lawyers have nobody to ask.

But the ideas keep spreading. Not Git itself, necessarily, but the principles it embodies. That every change should be recorded. That you should be able to see exactly what changed and when. That collaboration means working in parallel and merging, not passing a file back and forth and hoping nobody overwrites anything. That history matters, that the path you took to arrive at the final version is as valuable as the final version itself. The next place to watch is infrastructure: Git has quietly become a control plane for entire delivery systems, tracking not just the content being shipped but the pipelines that decide how and when it ships. And some of the uses in this episode, the genome datasets, the model weights, the terabyte-scale science archives, are approaching the hard limits of what Git was ever designed to hold, which is a story worth following.

Twenty-one terabytes of those ideas are sitting under a mountain in the Arctic right now, etched onto film that is designed to outlast every server, every hard drive, every cloud provider that exists today. Some of that film contains the Linux kernel. Some contains machine learning frameworks. And some contains taco recipes, poetry, belief systems, and German federal tax law. All of it tracked by a tool that a Finnish programmer built in ten days because he was angry about BitKeeper.

That might be the most unexpected thing about Git. Not that it works. Not that it scaled. But that a tool born from a licensing dispute in the Linux kernel community became the way humanity chose to organise, preserve, and share its knowledge, far beyond the world of people who know what a hash function is. The rest of the world did not adopt Git. The rest of the world forked it, adapted it, and made it their own. Which, if you think about it, is exactly how open source is supposed to work.

To compare two versions of anything, type git diff. With no arguments, it shows what you have changed since your last commit. To compare two specific commits, type git diff followed by the two commit hashes separated by a space. To compare two branches, type git diff followed by the branch names with two dots between them. The output highlights additions in green and deletions in red. Every change, whether it is a line of code, a paragraph of a novel, a clause in a law, or a step in a recipe, shows up the same way. What was there before, and what is there now.