The Code Review

The Red Pen

This is episode four of Git Good, Season Two, and before we talk about the future, we need to talk about something that happens millions of times a day on software teams around the world. Something so routine that most developers barely think about it. Something that shapes careers, ships products, breaks friendships, and determines who gets to call themselves a real programmer.

Code review. The practice of having another human being look at your work before it ships. A practice so universal that it feels inevitable, as if software has always been made this way. It has not.

Before GitHub, before pull requests, before the green button that says "Approve," there was a man at IBM with a clipboard and a process. It was nineteen seventy-six. Michael Fagan, a software engineer at IBM's Kingston facility in New York, published a paper with a dry title that would influence decades of software development. "Design and Code Inspections to Reduce Errors in Program Development." The paper described what became known as a Fagan inspection, and if you have ever sat in a code review that felt like a trial, you can trace the feeling back to this room.

The process was formal. You gathered a team. You assigned roles. There was a moderator, who ran the meeting and kept it on track. There was a reader, whose job was to paraphrase the code out loud, line by line, explaining what it was supposed to do. There was a recorder, who wrote down every defect found. There was the author, who sat there and listened while strangers picked apart their work. And there were the inspectors, everyone else in the room, whose job was to find errors. Only errors. The rules were explicit: questions were allowed only to the point at which an error was recognized. You were not there to discuss design. You were not there to suggest improvements. You were there to find bugs, and then you were done.

It worked. In IBM's trials, Fagan inspections caught eighty-two percent of errors before the code ever reached testing. Other large companies reported similar numbers, eighty to ninety percent defect detection. The process spread. By the early two thousands, more than a hundred organizations had been formally trained in the method. Code inspection was proven, rigorous, and effective.

It was also something almost nobody actually wanted to do. The method was effective at finding defects. It was terrible at being a thing humans voluntarily participate in.

The meetings took hours. Scheduling them was a logistical nightmare. The social dynamics were brutal. Imagine sitting in a conference room while a colleague reads your code aloud and four other people take notes on everything wrong with it. The method required that nobody discuss the author's intentions, only the defects in the output. The human being who wrote the code was explicitly separated from the code itself, a principle that sounds reasonable in a paper and feels like an interrogation in practice.

So what actually happened at most companies was one of two things. Either they did formal inspections for a while, found them draining and slow, and quietly stopped. Or they never did them at all. The developer finished their code, checked it in, and hoped for the best. Between the bureaucratic extreme of Fagan inspections and the other extreme of "nobody looks at anything," most of the software industry lived in a muddy middle. Over-the-shoulder reviews. Hallway conversations. A senior developer glancing at a diff and saying "looks fine." These were the real code review practices for most teams for most of the history of software development.

There was one notable exception, one community that built a code review culture so thorough and so scalable that it survived every trend. And it ran on email.

The Mailing List

The Linux kernel has been reviewed by email since the beginning. Not because email is a good code review tool. It is not. But because Linus Torvalds built Git to support the workflow he already had, and the workflow he already had was mailing lists.

Here is how it works, and how it still works today. A developer writes a patch. They format it using git send-email, which produces an email with the patch as inline text in the body. Not an attachment. Linus made this preference clear early and often: attachments make it harder to quote specific lines in a reply, and quoting specific lines is the entire point. The email goes to the relevant mailing list, which for the kernel is LKML, the Linux Kernel Mailing List. That list receives roughly fourteen hundred messages every single day, most of them patches. Other developers read the patch in their email client, hit reply, and type their comments between the lines of code they are reviewing. Then someone else replies to the reply, and a thread forms.

The Linux kernel accepts roughly eight changes per hour, from over four thousand developers sponsored by more than four hundred companies. All of it flows through email. Not GitHub. Not any web interface. The kernel's GitHub repository is a read-only mirror that does not accept contributions.

The system is fast. It is brutally direct. The feedback can be harsh, sometimes legendarily so. But it scales in a way that no meeting ever could, because every patch and every comment exists as a permanent, searchable, public record. Anyone can read any review. Anyone can learn from any argument. The entire history of how the kernel was built is there in the mailing list archives, not just the code but the reasoning behind the code.

This is important for what comes next, because when GitHub built the pull request, they were not starting from nothing. They were taking the mailing list model and giving it a web interface. And in doing so, they changed who could participate.

The Green Button

On February twenty-third, two thousand eight, Chris Wanstrath, co-founder of GitHub, published the platform's third blog post. The title was casual, almost an afterthought.

Last night I pushed out a feature Tom and I have been talking about since day one. Pull requests.

That was it. A few sentences explaining that you could now ask someone to pull your changes, or tell someone they should pull yours. It was a notification system, not a review tool. There was no commenting. No inline feedback. No approval workflow. Just a message that said "I have changes you might want."

Two years and two hundred thousand pull requests later, GitHub realized what they had accidentally built was not a notification system. It was a conversation. In August of two thousand ten, Ryan Tomayko published a blog post called "Pull Requests Two Point Oh" that redesigned the feature into something recognizable as what we use today.

Pull requests are now living discussions about the code you want merged. When you send a pull request, you are starting a discussion.

The key insight was in that word: "living." Before the redesign, pull requests were ephemeral notifications that existed temporarily in someone's inbox and then disappeared. After the redesign, they were permanent, public records with their own pages, their own comment threads, their own history of commits. You could link to a pull request. You could refer back to the discussion months later. The code, the conversation about the code, and the decision to merge or reject it all lived in one place.

Six months later, in February of two thousand eleven, GitHub added the ability to comment directly in a pull request's diff view. Inline comments. The thing that the Linux kernel mailing list had been doing for twenty years by quoting code in email replies, GitHub finally made possible with a click. And with that, the pull request became a code review tool.

It is hard to overstate what happened next. The pull request became the default workflow for collaborative software development. Not because it was mandated. Not because anyone proved it was better than the alternatives. But because GitHub made it so easy and so social that it just became the way things were done. Today, if you work on a software team, you almost certainly use pull requests. If you contribute to open source, you almost certainly use pull requests. The practice is so normalized that many developers who started their careers after two thousand twelve cannot imagine working any other way.

The pull request solved real problems. It gave every proposed change a home. It created a record of why decisions were made. It made review accessible to anyone with a browser, not just people who could parse inline email diffs. It lowered the barrier to participation in open source from "figure out how to email a patch to a mailing list" to "click a button."

But the pull request also created problems that nobody anticipated. And those problems are what most teams actually struggle with today.

The Bottleneck

Here is a number that should alarm you. The average pull request takes five days to get through review and merge. Five days. Not because the review itself takes five days. Because nobody picks it up for four of those days.

This finding comes from an analysis of four million review cycles across roughly twenty-five thousand developers. The top quarter of engineering organizations get pull requests reviewed in under four hours. Everyone else waits. And waits.

The reason is something psychologists call the bystander effect. When a pull request is assigned to the whole team, everyone assumes someone else will handle it. When it is assigned to a specific person, that person has other things to do and review is never the most urgent thing. Writing code feels productive. Reviewing someone else's code feels like a chore. Nobody budgets review time into their sprint. Nobody gets promoted for thorough reviews. The incentive structure is broken, and the pull request just makes the brokenness visible.

Meta tracked this problem with a metric they called Time In Review, the duration a diff waits for a reviewer to act. Their median was a few hours, which sounds reasonable until you look at the tail. The slowest twenty-five percent of reviews took over a day longer than the median. And they found something that should be obvious but apparently was not: the longer someone's slowest reviews took, the less satisfied they were with the entire review process. The frustration was not about average speed. It was about the worst cases.

Meta built a bot. They called it Nudgebot. It poked reviewers when diffs had been waiting too long. It reduced Time In Review by seven percent overall and cut the number of diffs waiting more than three days by twelve percent. A seven percent improvement from a bot that does nothing more than send reminder messages. That is how low the bar was.

The bottleneck creates a cascade. When developers know reviews take days, they batch their changes into larger pull requests to avoid the overhead of waiting multiple times. Larger pull requests take longer to review. Longer reviews are more likely to get rubber-stamped. Rubber-stamped reviews miss bugs. The entire system optimizes for throughput at the expense of the thing reviews are supposed to provide: someone else actually thinking about your code.

One analysis found that when a ten-minute change waits an hour for review, the change spends eighty-six percent of its lifetime idle. For typical teams with longer waits, changes spend ninety-two to ninety-nine percent of their lead time in review queues. The code is done. It is just sitting there, waiting for a human to look at it.

The Nitpick and the Rubber Stamp

The bottleneck is a time problem. What happens during the review is a different kind of problem, and it has two faces.

The first face is the nitpick. You submit a pull request that restructures the authentication system, and the first comment is about whether you should have named a variable "user_data" or "userData." You rewrite a database query that was causing timeouts in production, and someone asks why you used double quotes instead of single quotes. The term for this is bikeshedding, from a parable about a committee reviewing plans for a nuclear power plant that spends most of its time arguing about what color to paint the employee bike shed. The big decisions are too hard to have opinions about. The small decisions are easy.

Bikeshedding is not malicious. It is a cognitive bias. The brain latches onto what it can process quickly, and formatting is faster to process than architecture. But the effect is corrosive. Developers learn that submitting a pull request means subjecting themselves to a gauntlet of trivial objections. Some reviewers leave forty comments on variable names while missing a race condition in the business logic. The fix is straightforward: automate everything that can be automated. Let linters enforce formatting. Let style checkers enforce naming conventions. Free human reviewers to focus on logic, design, and intent. Most teams know this. Many teams still argue about semicolons.

The second face is the rubber stamp. Surveys suggest that roughly sixty-five percent of pull requests get approved with nothing more than "LGTM," which stands for "Looks Good To Me." A study of developer behavior found that forty-five percent of developers say the real obstacle to reviewing code is lack of time, and thirty-four percent cite pressure to ship. When those forces combine, you get approval theater: the reviewer opens the pull request, scrolls through the diff without really reading it, types four letters, and clicks approve. The compliance requirement is met. The review did not happen.

There is a social dynamic that makes this worse. Reciprocity. "I approved your pull request in thirty seconds. Now I expect you to approve mine in thirty seconds." A culture of fast approvals feels efficient. It feels like the team trusts each other. It feels like everyone is moving fast and shipping. What it actually means is that nobody is reviewing anything, and the pull request process has become a bureaucratic gate that consumes time without providing value.

The cruel irony is that these two pathologies, the nitpick and the rubber stamp, often coexist on the same team. The senior developer who leaves forty comments on a junior's pull request will rubber-stamp a peer's pull request without reading it. The review is not about the code. It is about the power dynamic between the author and the reviewer. And that observation leads to something uncomfortable.

The Bias in the Machine

A study we covered in Season One deserves a brief return here, because it is the most uncomfortable finding in code review research. In two thousand seventeen, researchers analyzed over three million pull requests on GitHub and found that women's code was accepted at higher rates than men's, but only when reviewers could not tell the contributor was a woman. When gender was visible from the profile, the acceptance rate reversed. The code was the same. The review was not.

This is the meritocracy problem that code review makes visible. The entire premise of open source contribution is that the code speaks for itself. Your background does not matter. Your credentials do not matter. The quality of the code is what gets judged. And then you discover that the quality of the code is not, in fact, what gets judged. The person reviewing the pull request is a human being with biases, and those biases affect whose code gets merged.

The pull request did not create this bias. The bias was there when Fagan's inspectors sat around a table, and it was there when kernel developers read email patches. But the pull request made it measurable, because suddenly there was data. Millions of reviews, each with a binary outcome: accepted or rejected. Enough data to see the pattern.

The Speed Trap

In two thousand eighteen, Nicole Forsgren, Jez Humble, and Gene Kim published a book called "Accelerate" that changed how the software industry measures itself. The book introduced four metrics, known as DORA metrics after the research group that produced them. Deployment frequency. Lead time for changes. Mean time to recovery. Change failure rate. The research was based on surveys of more than twenty-three thousand respondents from over two thousand organizations, and its central finding was provocative: speed and stability are not tradeoffs. Teams that ship faster also ship more reliably.

This finding rippled through the industry. If speed and stability are correlated, not opposed, then anything that slows you down is making you worse at both. And the thing that slows most teams down is code review.

Lead time for changes, the metric that tracks how long it takes code to go from commit to production, includes review time. Teams that want to improve their DORA metrics have a strong incentive to make reviews faster. Some teams responded well. They set service-level agreements for review time. They designated specific reviewers so pull requests did not sit in a shared queue. They kept pull requests small so they could be reviewed quickly.

Other teams responded poorly. They optimized for the metric instead of the thing the metric was supposed to measure. Make reviews faster. How? Approve faster. How? Stop reading the code. The DORA research explicitly found that top performers excel at both speed and quality. But on the ground, in the daily pressure of sprint cycles and deployment targets, speed won. Review time went down. Review quality went down with it.

The two thousand nineteen State of DevOps Report found something that should have caused more soul-searching than it did. Formal change management processes, the category that includes mandatory code review, had a negative impact on software delivery performance. Not a neutral impact. A negative one. Teams with heavyweight approval processes shipped slower, recovered slower, and failed more often. Not because review is bad, but because the way most teams practice review adds friction without adding insight.

The uncomfortable conclusion is that a bad review process is worse than no review process. At least with no process, you ship fast and find the bugs in production. With a bad process, you ship slowly and still find the bugs in production, because the reviewer rubber-stamped the pull request.

The Other Ways

Not everyone uses pull requests. And the alternatives are not marginal experiments. They are practiced by some of the most successful software organizations in the world.

In nineteen ninety-nine, Kent Beck published "Extreme Programming Explained" and described a radical idea: what if code review happened continuously, as the code was being written? This was pair programming. Two developers, one computer, one keyboard. One person writes the code. The other watches, thinks, and catches mistakes in real time. Beck's argument was simple. If code reviews are good, we should review code all the time.

Pair programming eliminates the bottleneck entirely. There is no pull request to wait for. There is no context switching. The reviewer has full context because they watched the code being written. Teams that adopted it found that it improved quality without actually taking twice as long, because two people working together solve problems faster and produce less unnecessary code. The Chrysler C3 payroll project, where Beck developed extreme programming, delivered successfully using this approach.

Woody Zuill took it further. Mob programming is pair programming scaled to the whole team. Everyone works on the same thing, at the same time, in the same space, at the same computer. One person drives. Everyone else navigates. They rotate the driver regularly. It sounds absurd. It also eliminates an entire category of problems.

The main issues a team faces while developing software just fade away with mobbing. There is no more communication problem. The whole team is there, necessarily exposed.

No communication problem. No waiting for review. No knowledge silos. No merge conflicts. No pull requests sitting in a queue. The code is reviewed by everyone as it is written. The cost is obvious: you are paying five or six salaries for one stream of code. The teams that practice it insist the math works out, because the code they produce needs less rework, less debugging, and less time spent on reviews that never happen.

Then there is Google. Every change at Google must be reviewed before it can be committed. No exceptions. Their internal tool, Critique, enforces a three-part approval system. First, someone must give an LGTM, confirming the core logic is sound. Second, a code owner must approve, someone who is responsible for the files being changed. Third, a readability reviewer must sign off, someone certified in the language's style standards. Three separate approvals. Every change. In a company where thousands of developers commit code every day.

The system works because Google invested heavily in making review fast. Critique integrates tightly with their monorepo, their IDE, their code search, and their automated analysis tools. The "attention set" feature tells you whose turn it is, eliminating the bystander problem. And there is an escape hatch: in genuine emergencies, a developer can force-commit and have the code reviewed after the fact. The principle is that review is mandatory, not that review must block progress in a crisis.

The most provocative alternative is post-commit review. Ship first. Deploy to production. Then review the code and fix anything that needs fixing in a follow-up. This sounds reckless until you realize it requires something that most teams should have anyway: thorough automated testing that catches bugs before they reach users, and the ability to roll back quickly when something slips through. If your tests are good enough and your deployment is fast enough, the question of when a human reviews the code becomes less urgent, because the automated systems have already validated that the code works.

Each of these models answers the same question differently: when should someone else look at your code? Before you commit it. While you write it. After you ship it. Before you merge it. The pull request answers "before you merge it," and it won. But winning does not mean it is the only good answer.

The Machine Reviewer

The first three episodes of this season showed you the wall that Git puts in front of individuals. This episode has been about what happens on the other side of the wall, when code gets shared and reviewed. Now here is where the story takes a turn.

In April of two thousand twenty-five, GitHub launched Copilot code review. Within a year, it had performed sixty million reviews. It now handles more than one in five code reviews on GitHub. Twelve thousand organizations run it automatically on every pull request.

Copilot code review handles pull request reviews and summaries, allowing teams to focus on more complex tasks.

The numbers are staggering. CodeRabbit, the other major AI review tool, has processed over thirteen million pull requests across two million connected repositories. By the end of two thousand twenty-five, thirty percent of enterprises with more than a thousand developers had deployed at least one AI code review tool.

AI review is good at specific things. It catches missing dependencies. It finds infinite loops. It identifies patterns that violate established conventions. It does not get tired. It does not bikeshed. It does not rubber-stamp because it is in a hurry. Seventy-one percent of Copilot's reviews surface actionable feedback. The other twenty-nine percent of the time it stays silent, because, as GitHub put it, silence is better than noise.

But AI review cannot do the things that make human review valuable at its best. It cannot ask "why did you choose this approach instead of that one?" and learn from the answer. It cannot notice that a junior developer is struggling and pair with them for an hour. It cannot say "this works, but the team decided six months ago to go a different direction, and here is why." It cannot build the shared understanding that turns a group of individuals into a team.

There is a deeper question lurking here. If AI writes more and more of the code, and AI reviews more and more of the code, what exactly is the human doing? The optimistic answer is that the human becomes the architect, the decision maker, the person who sets direction while machines handle implementation and verification. The pessimistic answer is that the human becomes the person who clicks "approve" on a pull request written by one AI and reviewed by another, rubber-stamping a process they no longer understand.

That second scenario is not hypothetical. It is happening right now. And the question of whether it matters, whether human understanding of the code is actually important or just a relic of a time when humans were the only ones who could write it, is one of the central tensions of the next few years.

The Position

I think code review, done well, is one of the most valuable practices in software development. I also think most teams do it badly, and the pull request is part of the reason.

The pull request works beautifully for open-source projects where contributors are strangers and trust must be established through the code itself. It works for distributed teams where synchronous collaboration is impossible. It works as a historical record of decisions. These are real benefits that justify the practice.

But for a co-located team of people who trust each other, the pull request is often overhead disguised as process. The review happens too late, when the code is already written and the author is emotionally invested in their approach. The feedback is asynchronous, which means context gets lost and tone gets misread. The bottleneck is structural, because nobody is incentivized to review promptly. And the bias research tells us that the reviewer is not the neutral judge the process pretends they are.

The best code review is the one that happens while the code is being written. Pair programming, mob programming, or just pulling someone over to your desk and saying "does this look right to you?" The pull request is a safety net for when that is not possible. But too many teams have made the safety net the primary practice, and they wonder why they keep hitting the ground.

The code review is where Git becomes personal, where your code becomes visible, where judgment happens.

That was episode four of Git Good, Season Two.

Git diff with the staged flag shows you exactly what you are about to commit. Not everything you have changed. Not everything in your working directory. Just the changes you have explicitly staged with git add. This is the moment of self-review, the pause before you commit where you look at your own work and ask whether this is really what you meant to save. Most developers skip it. They run git add, then git commit, without stopping to look. The ones who run git diff staged first catch the debug print statement they forgot to remove, the file they staged by accident, the change they meant to split into two commits. It takes five seconds. It is the cheapest code review you will ever get, and the only one where you are guaranteed a reviewer who actually reads the code.