The Power Tools

The Other Ninety Percent

Here is something that will sound familiar. You use Git every day. You know git add, git commit, git push, git pull, git branch, git merge. Maybe git stash on a good day. And that is your Git. That is the whole tool, as far as you are concerned.

You are using maybe ten percent of what Git can do.

The other ninety percent sits there quietly, documented in pages that read like legal contracts, waiting for the day you desperately need them. And that day always comes. Maybe it is three in the morning and your production application is broken and you know the bug was not there two weeks ago, but somewhere in the last four hundred commits, something went wrong. Or maybe a line of code is doing something bizarre and nobody on your team remembers writing it, and you need to know who did and why. Or you need to review a colleague's pull request but you are in the middle of your own work and switching branches would blow away hours of uncommitted changes.

Git has answers for all of these. Precise, powerful answers that most developers never discover.

This is the episode about those answers. The tools that separate someone who uses Git from someone who understands it. We are going to talk about hunting bugs with binary search, reading the history of individual lines, working in multiple branches simultaneously, and the automation layer that most continuous integration systems are secretly built on. These are the power tools. And once you know they exist, you will wonder how you ever worked without them.

Binary Search for Bugs

Picture this. You are a kernel developer in two thousand eight. The Linux kernel has tens of thousands of commits per release cycle. Somewhere in the last seven thousand commits, something broke. Suspend and resume stopped working on a specific laptop. You know it worked in the last release. You know it is broken now. Somewhere between those two points, someone changed something.

You could start reading commits one by one. Seven thousand of them. That would take weeks, assuming you even knew what to look for. You could ask around, hope someone remembers touching the relevant code. Or you could let Git do something clever.

Git bisect takes a problem that sounds impossible and makes it trivial. You tell Git two things: a commit where everything worked, and a commit where it is broken. Git picks the commit exactly halfway between them and checks it out for you. You test it. Works? The bug was introduced after this point. Broken? The bug was introduced before this point. Either way, you have just eliminated half the suspects. Git picks a new halfway point in the remaining range. You test again. Half the suspects gone again.

Seven thousand commits. After the first test, three thousand five hundred remain. After the second, seventeen fifty. After the third, eight seventy five. In about thirteen steps, thirteen tests out of seven thousand commits, you are staring at the exact commit that broke everything.

This is binary search, the same algorithm computer science students learn in their first year. But applied to version control history, it becomes something almost magical. Linus Torvalds built bisect into Git because kernel development demanded it. When you have thousands of developers contributing to the same codebase, regressions are inevitable. The question is not whether bugs get introduced, but how fast you can find them.

I will happily brute-force bug-finding even if it might take a little longer, if it is guaranteed to find it.

That word, guaranteed, matters. Other debugging approaches rely on intuition, on guessing which subsystem might be responsible, on hoping someone remembers the relevant change. Bisect does not guess. It narrows. Mechanically, mathematically, inevitably. Even if the bug is in a part of the codebase you have never looked at, bisect will find it.

But here is where it gets truly powerful. You can automate the whole thing. Write a test script that returns zero if the build works and one if it does not, then hand it to git bisect run. Git will check out commits, run your test, read the result, and keep bisecting. No human in the loop. Ingo Molnar, one of the most prolific Linux kernel developers, described running automated bisections that would compile and boot entire kernels on their own, finding the guilty commit while he stepped away from the keyboard.

A lot of the time, in the best case, I can get a fifteen step kernel bisection done in twenty to thirty minutes, fully automated.

Think about what that means. Fifteen steps. Each step compiles an entire operating system kernel, boots it, and tests whether a specific feature works. The whole thing runs unattended. At the end, Git points at one commit out of thousands and says this one. This is where it broke.

The algorithm itself is subtler than simple binary search, because Git history is not a straight line. It is a directed acyclic graph, full of branches and merges. Linus designed the bisection to work on this graph by finding commits that split the remaining search space as evenly as possible, accounting for the tree structure. He described it using an analogy from physics, talking about light cones of reachability through the commit graph. A commit halfway through the calendar is not necessarily a commit that bisects the history efficiently. Git finds the one that does.

There are complications, of course. Sometimes a commit in the middle of your range does not even compile, because someone introduced a build failure that was fixed two commits later. Git bisect skip handles this, marking a commit as untestable so Git works around it. Sometimes you hit a cluster of broken commits and Git has to get creative, using weighted random selection biased away from the boundaries of what you have already tested.

But the core promise holds. If you can define good and bad, Git will find the boundary between them. And in a world where codebases grow larger every year and the cost of a regression shipped to production grows with them, that promise is worth its weight in gold.

Bisect is a design philosophy as much as a debugging tool. Linus did not want developers guessing which subsystem was responsible. He wanted Git to guarantee the answer. The tool does not care about your hunches. It cares about math.

Every Line Has a Story

Git blame sounds hostile. The name conjures finger-pointing and office politics. But what it actually does is beautiful. It annotates every single line in a file with the commit that last changed it, who made that change, and when.

Imagine opening a file and seeing not just the code, but a timeline layered on top of it. This function was written by Maria in January. This comment was added by James three years ago. This one line, this strange conditional that nobody understands, was changed by someone who left the company in two thousand nineteen, and the commit message says "fix edge case for negative timestamps."

Suddenly that mysterious line has context. You know when it appeared. You know who wrote it. You know, from the commit message, what problem they were solving. The code stops being an anonymous artifact and becomes a conversation between people across time.

Linus Torvalds himself lists git blame among the five Git commands he uses most frequently. Not merge. Not rebase. Blame. For someone who created the tool and could use any part of it, he reaches for the one that answers "who changed this, and why."

The reason is practical. When you are maintaining a codebase with thousands of contributors and millions of lines, and something is wrong, the first question is almost never "what does this code do." The first question is "who touched this last, and what were they thinking." Git blame answers both in one command.

But blame has a deeper trick. By default, it shows you the last person who changed each line. Sometimes that is misleading. Maybe someone reformatted the entire file, changing every line without changing any logic, and now blame shows them as the author of everything. Git handles this. You can tell blame to look through formatting changes, through moved code, through copied code, peeling back the layers of superficial changes to find the commit that actually introduced the logic you are looking at.

And then there is the pickaxe. Git can search through history not just for who changed a line, but for when a specific string first appeared or disappeared in the codebase. You can ask Git: when did this function name first show up? When was this error message removed? The pickaxe digs through every commit, every diff, and finds the moment of introduction or deletion. Between blame and the pickaxe, Git gives you archaeology tools for code. You can trace any line back to the moment it was born, understand why it exists, and follow the chain of decisions that led to the current state. The entire intellectual history of a codebase is there, encoded in the commit graph, waiting to be read by anyone who knows to look.

What blame reveals is that code is not just instructions. It is a conversation between people across time. Every line has an author, a date, and a reason. The tool exists to make that conversation legible.

Two Desks, One Repository

Here is a scenario every developer knows. You are deep in a feature branch, files changed everywhere, half-finished work scattered across a dozen files. Your colleague posts a pull request and asks for a review. To review it properly, you need to check out their branch and run the code. But checking out their branch means your half-finished work either gets stashed, committed in a messy state, or potentially lost.

For years, the standard advice was "just stash it" or "commit your work in progress." Both solutions work but feel clumsy. Stashing is fragile and easy to forget about. Work-in-progress commits clutter the history. Some developers resorted to cloning the entire repository a second time, which wastes disk space and means maintaining two separate copies.

In two thousand fifteen, Git two point five introduced a feature that solved this cleanly. Git worktree. The idea is simple and powerful. Instead of one working directory per repository, you can have multiple. Each one checks out a different branch, and they all share the same underlying repository, the same object database, the same history.

Think of it as having two desks in your office instead of one. Your feature work stays on desk one, exactly as you left it. You walk over to desk two, which has your colleague's pull request checked out and ready to review. No stashing. No messy commits. Both branches are just there, simultaneously, in different directories on your disk.

The person behind this feature was Nguyễn Thái Ngọc Duy, a long-time Git contributor who had been working on the concept for years before it shipped. When it landed in Git two point five, it was marked experimental, with warnings about potential bugs and a specific caution against using it alongside submodules. A decade later, those rough edges have been smoothed away, but worktree remains one of Git's least known features.

The real power shows up at scale. Build servers can maintain multiple worktrees to test different branches in parallel without cloning the repository multiple times. Each worktree shares the object database, so disk space stays reasonable. And because worktrees are lightweight, spinning one up takes a fraction of the time a fresh clone would need.

For individual developers, worktrees change the mental model. Instead of thinking "I can only be in one place at a time," you start thinking "I can be in as many places as I need." Long-running feature branches, urgent hotfixes, code reviews, experimental prototypes, each in its own directory, all sharing the same history. It is Git's cheap branching philosophy, the idea we talked about back in episode seven, extended from the version history into your actual filesystem.

Repositories Inside Repositories

Now for the feature everyone loves to complain about. Every team that has tried to share code between projects has a submodule story. The junior developer who cloned the repository but not its submodules, then spent three hours wondering why nothing compiled. The senior engineer who switched branches and left the submodule pointing at a commit that does not match what the branch expects.

Git submodules let you embed one repository inside another. Your main project can include a library, a shared component, or a set of configuration files, each tracked in its own repository with its own history, pinned to a specific commit. The idea is sound. Large organizations often have shared code that multiple projects depend on. Instead of copying that code and maintaining it in multiple places, submodules let each project reference the original repository. Changes to the shared code happen in one place and propagate when projects choose to update their pinned commit.

The practice is a different story. The mental model is not complicated in theory, a pointer to a specific commit in another repository. But the number of ways things can go sideways has made submodules the butt of more Git jokes than any other feature. Cloning does not automatically pull submodules unless you remember the extra flag. Updating can introduce changes you did not anticipate. And the detached states, where a submodule points at a commit that exists nowhere the branch expects, are the kind of silent failure that wastes entire afternoons.

The friction is real enough that alternatives have emerged over the years. Git subtree takes the opposite approach, merging the external repository's files directly into your project, trading the two-repository complexity for a messier but simpler history. Package managers like npm and pip let you treat shared code as installed dependencies rather than embedded repositories. And monorepos, the approach Google and Facebook championed, sidestep the problem entirely by putting everything in one enormous repository, eliminating the need for cross-repository references altogether.

Still, submodules have their defenders. For organizations that genuinely need shared repositories with independent version histories and selective pinning, nothing else in Git does quite the same thing. The power is real. The user experience is just unfortunate. It is the Git philosophy in miniature: extraordinarily capable, poorly explained.

The Hooks

Everything we have talked about so far is something you run manually. You decide to bisect. You decide to blame. You choose to create a worktree. But Git has one more category of power tool, and this one runs itself.

Git hooks are scripts that execute automatically at specific moments in the Git workflow. Before a commit is created. Before changes are pushed to a remote. After a merge completes. After a checkout. Git defines about twenty different hook points, each corresponding to a specific event in the version control lifecycle.

The mechanism is deliberately simple. Inside every Git repository, there is a hooks directory. Drop a script in it with the right name, make it executable, and Git will run it at the right moment. A script named pre-commit runs before every commit. One named pre-push runs before every push. If the script exits with an error, Git aborts the operation. If it exits cleanly, Git proceeds.

This simplicity enables extraordinary things.

A pre-commit hook can run your code formatter and linter. Write code with a style violation, try to commit, and the hook catches it before the commit is even created. No sloppy formatting enters the history. A pre-push hook can run your test suite. Try to push code that breaks a test, and the hook blocks you. The remote never sees broken code.

Teams use hooks to enforce commit message conventions. Every message must reference a ticket number. Every message must follow a specific format. The hook checks the message and rejects anything that does not comply. What used to be a paragraph in a style guide that people gradually forgot about becomes an automated gate that nobody can bypass.

The JavaScript world embraced this wholeheartedly through a tool called Husky, which makes hooks easy to share across a team. Instead of each developer manually configuring their hooks directory, Husky installs the hooks automatically from a configuration file checked into the repository. When a new developer clones the project and runs the setup, they get the hooks for free. Formatting checks, lint rules, test requirements, all enforced from their very first commit.

But hooks extend far beyond local development machines. Most continuous integration systems are, at their core, hooks wired to remote events. When you push to GitHub, a webhook fires. That webhook triggers a build server. The build server runs your tests, your linter, your security scans, your deployment scripts. The entire modern continuous integration pipeline, GitHub Actions, GitLab CI, Jenkins, CircleCI, all of it is an elaboration on the same idea that Git hooks introduced: something happens in the repository, and a script runs in response.

Server-side hooks are particularly powerful. A pre-receive hook on your Git server can reject pushes that do not meet certain criteria. Force pushes to the main branch? Blocked. Commits without proper sign-off lines? Rejected. Binary files over a certain size? Stopped at the gate. These server-side hooks enforce policy across an entire organization without relying on individual developers remembering to do the right thing.

The beauty of hooks is that they transform Git from a passive recorder of what you did into an active participant that helps you do it correctly. Every hook point is a chance to catch a mistake before it propagates, to enforce a standard before it is violated, to run a check that a busy developer would forget. The hooks have been there since the very beginning of Git. And like everything else in this episode, most developers have never written one.

Depth Versus Discoverability

Here is the paradox of Git. Everything we have talked about in this episode, bisect, blame, worktrees, hooks, these are not obscure experimental features. They are not hidden behind flags or special builds. They have been part of Git for years, some of them nearly since the beginning. Bisect has been there since two thousand six. Blame is one of the oldest annotation commands in the tool. Hooks have existed since Git's very first release. Even worktrees, the youngest of the bunch, have been stable for a decade.

And yet most developers have never used them.

Surveys consistently show that over ninety percent of professional developers use Git. But talk to those developers and you will find that most of them know maybe ten or fifteen commands. The daily workflow: add, commit, push, pull, branch, merge, maybe stash and log. They know enough to get through the day. They do not know that bisect exists. They do not know that blame can trace authorship through file renames and code movement. They have never heard of worktrees. They might know that hooks exist in theory but have never written one.

This is not a failing of the developers. Git's documentation is famously dense. The manual page for git bisect alone runs thousands of words and assumes familiarity with directed acyclic graphs and binary search algorithms. The command names are inconsistent and sometimes actively misleading. Checkout used to mean five different things depending on which flags you passed. Reset has three modes that behave so differently from each other that they could be three separate commands. The learning curve is not a curve. It is a cliff with a reference manual at the bottom.

Linus designed Git to be powerful, not approachable. The kernel community that was Git's first audience consisted of some of the most technically sophisticated developers on the planet. They did not need gentle onboarding. They needed tools that were fast, correct, and flexible, and that is what Linus gave them.

But Git outgrew that audience. It became the version control system for the entire software industry. Junior developers, designers touching code for the first time, data scientists who just need to track their notebooks, project managers running documentation repositories, they all use Git now. And they all hit the same wall. The basics are learnable through tutorials and Stack Overflow answers. But the advanced features, the tools that would save them hours, require a kind of dedicated study that most working developers simply cannot justify.

This is the tension underneath everything in this episode. Git is extraordinarily deep. The features we covered today are just the surface of what the power tools can do. Bisect can be scripted to test arbitrary conditions across any definition of good and bad. Blame can be configured to ignore whitespace changes, follow code across renames, and trace through bulk reformats to find the original author. Worktrees can be locked, listed, pruned, and managed as first-class citizens. Hooks can integrate with anything that speaks the language of exit codes.

But none of this is surfaced to the user. There is no prompt when you manually check out ten commits in a row that says "did you know about git bisect?" No suggestion when you open a file for the hundredth time that blame might answer the question you are about to ask a colleague. Git rewards the investment of learning it deeply. But it does not invite that investment. It just sits there, powerful and patient, waiting.

That wraps up Act Five of this series. We have spent four episodes in the world of Git in practice, the workflows that teams build, the safety nets that catch mistakes, the release conventions that communicate trust, and now the power tools that most developers never discover.

But we have been looking inward, at Git itself and how people use it. In the next act, we turn outward. What happens when Git meets the wider world? When companies try to scale it to repositories the size of entire operating systems. When the alternatives that lost the version control wars are examined for what they got right. And when the trust model that the entire open source world depends on gets exploited, when someone uses Git's own social infrastructure to slip a backdoor into a compression library that ships with nearly every Linux distribution on the planet. The power tools are impressive. But the world they operate in is about to get more complicated.

Git bisect turns "the bug is somewhere in these four hundred commits" into "the bug is in this one commit." You give it a known good point and a known bad point, and it binary searches the space between them. Thirteen steps for seven thousand commits. The math is relentless and the answer is guaranteed, which is exactly why Linus built it. Most debugging is educated guessing. Bisect is the rare tool that does not guess at all.