Git in Two Thousand Twenty-Five

Twenty Years, One Maintainer

On April seventh, two thousand twenty-five, the Git project turned twenty years old. No fanfare. No product launch. No keynote. Just another day of patches reviewed, mailing list threads answered, and release candidates tested. The person doing that work was the same person who had been doing it since July two thousand five. Junio C Hamano.

When interviewers tracked down Linus Torvalds for the anniversary, he deflected. Twenty years later, he said, you should talk to Junio, not to me.

Linus has always been generous about this. He takes credit for the initial design, the two-week sprint we covered in episode four, but he is blunt about what happened after.

I'll take credit for getting the core design right, and getting the project started, but it really is Junio who has led the project.

Think about what twenty years of maintenance means. Junio showed up during the first week of Git's existence in April two thousand five. By July of that year, Linus handed him the keys. Since then, Junio has reviewed every patch that goes into Git. He sends regular emails to the mailing list titled "What's cooking in git dot git," describing the state of every patch series in flight. He decides what gets merged, what gets sent back for revision, and what gets rejected. He does this with a quiet, steady temperament that is the polar opposite of the person who created the tool. We called him the quiet steward back in episode four, when he first appeared. Twenty years later, the title fits better than ever.

Linus is famous for his blunt, sometimes caustic communication style. Junio is famous for patience. For thoroughness. For never shipping something he was not confident in. Paul Eggert, the RCS maintainer, mentored Junio in open source work early in his career, and that careful, precise approach stuck. Before Git became his full-time job at Google, Junio balanced the project with a day job, reviewing patches in his evenings and weekends.

Most developers have never heard his name. They type git commit and git push dozens of times a day without knowing that one person has been steering the tool for two decades. The invisible maintainer of the infrastructure everyone depends on.

This is rare in open source. Projects usually follow one of two paths. The creator burns out and the project stagnates, or the creator stays too long and becomes a bottleneck. Git got a third option. Linus built it, then stepped away at exactly the right moment, and Junio carried it forward with the kind of consistent, unglamorous stewardship that does not make headlines but makes everything else possible.

Twenty years. The same person. Still reviewing patches on the mailing list.

The New Fingerprint

Back in episode five, we talked about how Git names every piece of content by running it through a mathematical function that produces a unique fingerprint. That function was SHA-1, a cryptographic algorithm designed in the early nineteen nineties by the United States National Security Agency. Every commit, every file, every tree in every Git repository on the planet is identified by a SHA-1 fingerprint. Forty characters of hexadecimal. The backbone of the whole system.

In two thousand seventeen, researchers at Google and CWI Amsterdam demonstrated a practical collision attack against SHA-1. They produced two different PDF files with the same SHA-1 fingerprint. The algorithm that Git trusted to guarantee uniqueness could be fooled. The theoretical weakness that cryptographers had warned about for years was now real.

The Git project had known this day was coming. The plan was to transition from SHA-1 to SHA-256, a stronger algorithm with sixty-four character fingerprints. Simple to describe. Extraordinarily difficult to execute.

This is not just an engineering problem. It is an identity problem. Every object in Git is named by its fingerprint. Change the fingerprinting algorithm and you change the name of everything. The commit you pushed yesterday gets a different identity. The tag you signed last year points to a different hash. It is the philosophical equivalent of renaming every person in a city and expecting the mail to still arrive.

Think about what needs to change. Every Git repository in the world stores objects named by SHA-1 fingerprints. Every commit references its parents by their SHA-1 fingerprints. Every tag, every tree, every packfile index is built around SHA-1. Changing the fingerprinting algorithm is not like updating a library. It is like changing the foundation of a building while people are still living in it.

A developer named brian m. carlson has done the bulk of the transition work. By two thousand twenty, Git could create SHA-256 repositories using a special flag. Most local operations worked. But the critical missing piece was interoperability. A SHA-256 repository could not talk to a SHA-1 repository. You could not push from one to the other. You could not clone between them.

And without interoperability, SHA-256 was, as one contributor put it, essentially useless to many people. No hosting platform, not GitHub, not GitLab, not any forge, supported SHA-256 repositories. The feature existed in the code but lived in practical isolation.

This stalled for years. The foundational work was roughly ninety percent done, but the remaining integration work, the part that required coordination across the entire Git ecosystem, lacked both developer enthusiasm and corporate funding. It is a pattern familiar in open source. The exciting core work gets done. The tedious interoperability plumbing does not.

Then, slowly, things started moving again. Git two point forty-two in two thousand twenty-three declared SHA-256 repositories no longer an experimental curiosity. Git two point forty-six in two thousand twenty-four updated its documentation to confirm that SHA-256 would become the default in Git three point zero. And Git two point fifty-one in two thousand twenty-five continued preparing the internal plumbing for the switch.

carlson estimates that the full transition needs somewhere between two hundred and four hundred patches, with about a hundred complete. The biggest remaining obstacle is not Git itself. It is the ecosystem. Every Git library, every hosting platform, every CI system, every tool that parses Git objects needs to understand SHA-256 before the default can change. Git three point zero is on the horizon, but nobody is putting a date on it.

This is what responsible infrastructure evolution looks like. Not a flag day. Not a forced migration. A careful, years-long process of making the new thing work alongside the old thing until the old thing can be quietly retired. Boring. Essential. Exactly the kind of work that Junio's maintenance style is built for.

Fetching on Demand

Episode eighteen covered what happens when Git hits its limits. The Windows repository at three hundred gigabytes. Google's billions of lines of code. Facebook choosing Mercurial because Git could not scale.

The Git project has been answering those challenges from within, and the two most important answers are partial clone and sparse checkout.

Here is the old model. You run git clone and Git downloads everything. Every file, every commit, every branch, the entire history of the project, onto your machine. This is Git's founding principle. Every clone is a kingdom. Every copy is complete. It is what makes Git resilient, and it is what makes it impossible for enormous repositories.

Partial clone breaks that principle, carefully. Instead of downloading every object, Git fetches only what you need right now. The rest stays on the server, available on demand. Need the history of a file you have never touched? Git fetches it when you ask. Need a blob from three years ago? Git retrieves it over the network, just in time.

Sparse checkout is the companion feature. Instead of checking out every file in the repository, you tell Git which directories you care about. A front-end developer working on the user interface does not need the machine learning models. A documentation writer does not need the test fixtures. Sparse checkout in cone mode, which became the recommended approach in Git two point twenty-seven, lets you define the subdirectories that matter to you and ignore the rest.

Combined, these features can transform a thirty-gigabyte, four-hour clone into a targeted two-minute operation. And they are no longer experimental curiosities. Teams working with monorepos in two thousand twenty-five treat partial clone and sparse checkout as first-class features, not obscure hacks.

The other quiet revolution is git maintenance. Before two point twenty-nine, repository optimization was something you did manually with git gc, or more likely, something you never did at all. Repositories slowly accumulated loose objects, fragmented packfiles, and stale data. Performance degraded gradually, like a car nobody takes in for service.

git maintenance changed that. You run git maintenance start once, and Git schedules background tasks: repacking objects, updating the commit-graph file, prefetching from remotes, running incremental garbage collection. It does this hourly, daily, or weekly, depending on the task, using your operating system's scheduler. You never think about it. Your repository stays fast.

The commit-graph file deserves a moment of its own. Walking commit history in a large repository used to require reading each commit object individually from disk. The commit-graph file pre-computes and stores the commit topology in a flat file that Git can traverse without parsing individual objects. For repositories with millions of commits, this turns operations like git log and git merge-base from multi-second waits into instant responses.

These features share a philosophy. Git's core design, the content-addressed store, the distributed model, the directed acyclic graph, remains untouched. But the implementation has grown smarter about when to fetch, what to store locally, and how to keep performance sharp. The architecture Linus sketched in two thousand five still holds. The engineering around it has matured.

When the Machine Writes the Code

Here is a question that did not exist in two thousand five, or two thousand fifteen, or even two thousand twenty. When an AI generates a line of code and a human commits it, who is the author?

Git's metadata model is simple. Every commit has an author field and a committer field. A name and an email address, embedded in the commit object, stored forever in the history. This was designed for a world where a human being wrote code, a human being reviewed it, and a human being committed it. The author was the person who created the change. The committer was the person who applied it. Clear, unambiguous, human.

That world is fading. AI coding assistants generate code that developers accept, modify, or reject. Some tools add co-authorship trailers to commits. Others generate entire pull requests. The volume of AI-assisted code is growing fast, and Git's metadata was not built for this.

The problems are concrete. Run git blame on a file and you see a name next to every line. That name is supposed to tell you who to ask when something breaks. But if an AI suggested the line and a developer accepted it without fully understanding it, the name in git blame is the person who pressed enter, not the person, or thing, that wrote the logic.

Some teams have started tagging AI-assisted commits with markers in the commit message, an AI prefix or a co-authored-by trailer naming the model. But this is convention, not infrastructure. Git has no field for "this commit was AI-assisted." There is no structured way to record which model was used, what prompt produced the code, or whether a human reviewed every line.

The implications go beyond debugging. In regulated industries, in finance, in healthcare, in government contracting, there are legal requirements about who authored code and who reviewed it. Code provenance is not a curiosity. It is a compliance requirement. And Git's authorship model, two fields, two names, two email addresses, does not capture the complexity of a workflow where an AI drafts, a human edits, and a bot merges.

Some tools are trying to fill the gap from the outside. Git AI, an extension, stores AI attribution data in Git Notes, a metadata layer that sits alongside commits without modifying them. Other teams are building browser extensions that annotate pull requests with AI contribution data. These are patches on the social layer, not changes to Git itself.

The deeper question is whether Git's metadata model needs to evolve. Adding a new field to the commit object would be a fundamental change, the kind of thing that breaks every tool that parses commits. It would make the SHA-256 transition look simple. Nobody is seriously proposing it. But the gap between what Git records and what actually happened is widening with every AI-assisted commit.

This is not a crisis. Not yet. But it is the kind of slow-building tension that reshapes tools. The same way that the gap between centralized version control and distributed workflows eventually produced Git itself.

Stability Versus Evolution

To replace Git you have to be not just slightly better, you have to be enormously better. I would expect Git to stay relevant for the foreseeable future.

Torvalds is probably right. Network effects in developer tools are brutal. Every CI system integrates with Git. Every hosting platform speaks Git. Every developer's muscle memory is git commit, git push, git pull. Switching costs are astronomical, not because Git is expensive but because everything around it is built on the assumption that Git exists.

But stability is not the same as stagnation. The Git of two thousand twenty-five is meaningfully different from the Git of two thousand five, even though the core data model is identical. Partial clone, sparse checkout, the commit-graph, background maintenance, the ongoing SHA-256 transition. These are not minor tweaks. They are the kind of careful, additive engineering that lets a twenty-year-old tool meet demands its creator never imagined.

The people doing this work are not hobbyists. The largest contributions to Git now come from engineers at Microsoft, Google, GitHub, and GitLab. Companies that depend on Git at enormous scale and invest engineering time in making it faster and more capable. The Git mailing list in two thousand twenty-five reads like a who's who of Big Tech employers. The tool that was born from one person's frustration with a proprietary vendor is now maintained by the engineering departments of the largest technology companies on the planet.

Linus predicted this too. He said the improvements would happen around Git rather than as replacements for it. That the ecosystem built on Git is so deep and so broad that even genuinely better ideas, Pijul's patch theory, Fossil's integrated bug tracking, Jujutsu's cleaner interface, cannot overcome the gravitational pull.

And through all of it, Junio Hamano reviews patches. He has seen Git go from a kernel tool to the foundation of an industry. From a few hundred users to tens of millions. From a mailing-list project to a corporate-funded ecosystem. He has navigated the transition from pure volunteer effort to a project where most contributors are paid by companies with their own interests. He has kept the project honest, releasing on a predictable schedule, maintaining backward compatibility, never rushing a feature that was not ready.

Twenty years is ancient in software. Most tools from two thousand five are forgotten. The ones that survive, the ones that become infrastructure, do so because they were well designed and well maintained. Git had both. The design came in two weeks. The maintenance has taken twenty years and counting.

Next episode, we bring the whole story together. From the filing cabinets and magnetic tapes of episode one, through the crisis that created Git, the platform that consumed it, and the questions that face it now. Everything the listener understands that most developers never will.

Git maintenance start is one command you run once and never think about again. Git registers the repository for automatic background housekeeping: repacking objects, refreshing the commit graph, cleaning up loose data. For small projects the difference is invisible. For anything with years of history or thousands of files, it is the difference between a tool that gradually slows down and one that stays sharp. Maintenance is not glamorous. Neither is Junio Hamano. Both are the reason Git still works.