The Secrets in the History

Four Minutes

This is episode eleven of Git Good, Season Two. Somewhere, right now, a developer is pushing code to a public repository on GitHub. Mixed in with the function calls and the config files, there is a line that looks like a random string of characters. It is an AWS access key. The developer does not notice. They might not notice for hours, or days, or ever. But someone else will notice in about four minutes.

In two thousand twenty-three, researchers at Palo Alto Networks ran an experiment. They deliberately planted an AWS credential in a public GitHub repository and watched what happened. Within four minutes, automated scanners had found the key. Within seven minutes, the account behind that key had received over four hundred API calls across multiple regions. The attackers were not humans browsing GitHub. They were bots, programmatically cloning every new public repository, scanning the contents for anything that looked like a credential, and testing what they found against cloud provider APIs. The entire chain from accidental exposure to active exploitation took less time than it takes to make a cup of coffee.

The researchers called the campaign EleKtra-Leak. It had been running since at least December two thousand twenty, quietly siphoning exposed keys from GitHub and spinning up Amazon compute instances to mine cryptocurrency. In one two-month stretch they tracked, the operation used four hundred and seventy-four separate cloud instances across multiple AWS regions, all funded by keys that developers had accidentally made public.

And here is the thing that makes this story specifically about Git, not just about careless developers. Those keys were not sitting on a web server. They were not posted to a forum. They were buried in Git commits. In the version history. In a system whose entire philosophy is that nothing should ever be lost. Git was designed so that every change is permanent, every snapshot is preserved, every version is recoverable. That is its greatest strength. And for secrets, it is its most dangerous quality.

Twenty-Three Point Eight Million

Season One touched on the problem of secrets in Git history. This episode goes much deeper. Every year since two thousand twenty-one, a company called GitGuardian has published a report called the State of Secrets Sprawl. Their core product does one thing: scan Git repositories for credentials that should not be there.

The numbers in their reports have gotten worse every single year. In two thousand twenty-two, they found roughly ten million new secrets exposed in public GitHub repositories. In two thousand twenty-three, that number rose to twelve point eight million, a twenty-eight percent increase. And in two thousand twenty-four, the count hit twenty-three point eight million. A twenty-five percent jump in a single year.

Let those numbers sink in. Twenty-three point eight million secrets. API keys, database passwords, OAuth tokens, private certificates, cloud credentials. Exposed on public GitHub in a single calendar year. That is roughly forty-five secrets pushed to public repositories every single minute of every single day.

Fourrier put the problem bluntly in the two thousand twenty-five report.

Unlike zero-day vulnerabilities, attackers do not need sophisticated tools. Just one leaked credential can grant unrestricted access.

He is right. A zero-day exploit requires deep technical knowledge, custom tooling, and often months of research. A leaked credential requires a search query and a copy-paste. The asymmetry is staggering. Companies spend millions on firewalls, intrusion detection, vulnerability scanning, penetration testing. And then a junior developer commits a config file with a database password and all of it becomes irrelevant.

The GitGuardian report contains another number that is arguably worse than the headline figure. Seventy percent of secrets that were detected in two thousand twenty-two were still valid two years later, in two thousand twenty-four. Not rotated. Not revoked. Just sitting there, live credentials in public Git history, waiting.

The Uber Breach

If you want to understand why leaked credentials matter, the Uber story is the one to tell. Not because it was the most technically sophisticated attack. It was almost embarrassingly simple. But because of what happened after.

In two thousand sixteen, two hackers named Brandon Charles Glover and Vasile Mereacre hired a freelancer to write a script. The script took stolen credentials and tested them against GitHub accounts, looking for private repositories that belonged to interesting companies. It found Uber. Inside Uber's private GitHub repositories, the hackers discovered an AWS access key. The key had been created in two thousand thirteen and should have been rotated years earlier, but never was. Using that key, they accessed an Amazon S3 storage bucket containing personal data on fifty-seven million Uber users and drivers. Names, email addresses, phone numbers, and for drivers, license plate numbers.

Fifty-seven million people. Because of one credential that sat unrotated in a private repository for three years.

On November fourteenth, two thousand sixteen, the hackers contacted Uber and demanded money. The message landed with Joe Sullivan, Uber's Chief Security Officer. Sullivan was not some mid-level manager. He was a former federal prosecutor who had served as an Assistant United States Attorney in the Northern District of California. He had prosecuted cybercrime cases. He knew exactly what this situation meant, legally.

The next day, November fifteenth, Sullivan texted the company's CEO, Travis Kalanick. The message was brief. Something sensitive I would like to update you on if you have a minute. What followed was a series of phone calls and FaceTime conversations that would later become central evidence in a federal criminal case.

Here is where the story turns from a data breach into a cover-up. At the time, Uber was already under investigation by the Federal Trade Commission for a separate, earlier data breach from two thousand fourteen. Sullivan himself had testified before the FTC about Uber's security practices. He knew the company was under scrutiny. He knew there were legal obligations to report breaches affecting customer data.

Instead of reporting the breach, Sullivan authorized a payment of one hundred thousand dollars to the two hackers. Fifty thousand each, routed through HackerOne, a legitimate bug bounty platform that companies use to pay security researchers who responsibly disclose vulnerabilities. The hackers were required to sign non-disclosure agreements promising to delete the stolen data and keep quiet.

Think about what happened there. A legitimate bug bounty is when a researcher finds a vulnerability, reports it to the company, and gets paid for helping fix it. What Sullivan did was pay the criminals who had already stolen the data, disguise the payment as a bounty, and use a legal contract to buy their silence. The line between paying for security research and paying ransom gets very thin when the researcher already has your data.

For over a year, nobody outside Uber's inner circle knew. The fifty-seven million affected users were not notified. The FTC was not told, even though Sullivan had been testifying before the agency about Uber's security posture. It was as if the breach had never happened.

Then, in the fall of two thousand seventeen, Uber got a new CEO. Dara Khosrowshahi replaced Kalanick, and his team discovered what had been hidden. On November twenty-first, two thousand seventeen, more than a year after the breach, Uber publicly disclosed it. Sullivan was fired.

The People Versus Joe Sullivan

In August two thousand twenty, the Department of Justice charged Joe Sullivan with obstruction of justice and misprision of a felony, which is the legal term for knowing about a crime and actively concealing it. It was an extraordinary case. A sitting Chief Security Officer of a major technology company, criminally charged not for the breach itself but for covering it up.

The prosecution argued that Sullivan's ego drove the cover-up. He had built his reputation on security. He had testified before federal regulators about how well Uber protected its users. Admitting that hackers had walked in through a three-year-old unrotated key in a GitHub repository would have been humiliating, both for him personally and for the company he was supposed to be protecting.

On October fifth, two thousand twenty-two, after a trial in the Northern District of California, the same district where Sullivan had once served as a prosecutor, the jury found him guilty on both counts. It was the first time a corporate security executive had been criminally convicted for covering up a data breach.

The sentencing, in May two thousand twenty-three, was unusual. Federal Judge William Orrick gave Sullivan three years of probation rather than prison time. But the judge made it very clear that this was not a precedent.

If I have a similar case tomorrow, even if the defendant had the character of Pope Francis, they would be going to prison.

Then the judge turned to Sullivan directly and said something that every security executive in the country should hear.

When you go out and talk to your friends, to your CISOs, you tell them that you got a break not because of what you did, not even because of who you are, but because this was just such an unusual one-off.

As for the hackers themselves, Glover and Mereacre both pleaded guilty and cooperated with the prosecution against Sullivan. The tool they built, a script that tested stolen credentials against GitHub accounts, was not sophisticated. It did not require a team of nation-state hackers or a zero-day exploit chain. It required a list of stolen passwords and an afternoon of coding.

Why Deleting Is Not Enough

If you have ever accidentally committed a password to a Git repository, your first instinct was probably the right one. Delete the file, commit the deletion, push. Problem solved.

Except it is not. And this is the part of the story that is specifically, uniquely about Git.

When you delete a file and commit, Git does not erase the old version. It creates a new snapshot that no longer includes that file. But the old snapshot, the one with the password, is still in the repository's history. Anyone who clones the repository gets every snapshot. Every version of every file that has ever existed. Including the one you deleted.

This is not a bug. This is the core design principle of Git. Linus Torvalds built Git so that every clone is a complete backup, every commit is immutable, every piece of history is preserved forever. When you are tracking the Linux kernel's source code across thousands of contributors, this is exactly what you want. When you accidentally commit your production database password, it is a nightmare.

The only real fix is to rewrite history. Tools like BFG Repo-Cleaner and git filter-repo can walk through every commit in your repository and scrub the sensitive data from each one. But rewriting history means every commit hash changes, which means every clone, every fork, every open pull request is now pointing at commits that no longer exist. You have to force-push the rewritten history and every collaborator has to re-clone. For a personal side project, that is annoying. For a repository with hundreds of contributors, it is a logistical nightmare.

And even if you do all of that perfectly, there is still a window. From the moment you pushed the secret to the moment you finished rewriting history and every clone was updated, the credential was exposed. If a scanner found it in those four minutes before you even realized your mistake, rewriting history changes nothing. The credential is already in someone else's hands.

This is why every security guide says the same thing. If you push a credential to a public repository, even for a second, assume it is compromised. Rotate the credential immediately. Do not waste time trying to scrub history first. Rotate, then clean up.

The Arms Race

The good news, such as it is, is that the defenders have not been sitting still. The arms race between accidental leakers and malicious scanners has produced an entire category of security tooling that did not exist a decade ago.

On the scanning side, there is TruffleHog, an open source tool that can crawl through a repository's complete Git history, not just the current state, looking for anything that resembles a credential. It checks over eight hundred different credential patterns and can verify whether a detected key is actually live by testing it against the relevant API. If TruffleHog tells you it found an active AWS key in a commit from two thousand nineteen, you know it is real because it already tested it.

On the prevention side, there is git-secrets, originally built by the AWS team, which installs as a pre-commit hook. Every time you try to commit, it scans your staged changes for patterns that look like credentials and blocks the commit if it finds anything. The idea is simple. Catch the secret before it enters the history, and you never have to scrub anything.

And then there is GitHub itself. In two thousand eighteen, GitHub launched its secret scanning partner program. The concept is straightforward. When someone pushes a commit containing something that looks like an API key for a partner service, say AWS, Slack, or Stripe, GitHub automatically notifies that service provider. The provider can then revoke or quarantine the credential before anyone has a chance to abuse it.

In two thousand twenty-three, GitHub went further with push protection. Rather than notifying the provider after the push, push protection blocks the push entirely. If GitHub detects a known credential pattern in your commit, it refuses to accept the push and tells you to remove the secret first. You can override the block, but you have to explicitly acknowledge that you are pushing a credential on purpose.

The EleKtra-Leak researchers at Palo Alto Networks actually observed this system working in real time. When they planted their test credential, GitHub's secret scanning detected it and notified AWS within minutes. AWS automatically applied a quarantine policy that restricted what the key could do. But here is the catch. The attackers started their reconnaissance just four minutes after the quarantine policy was applied. They were already probing the key before the automated defenses had fully kicked in. The bots are that fast.

The Human Problem

You could look at all of this and conclude that it is a tooling problem. Better scanners, stricter pre-commit hooks, more aggressive push protection. And you would be partly right. The tooling has gotten dramatically better in the last five years.

But the numbers keep going up. Twenty-three point eight million secrets in two thousand twenty-four, up from twelve point eight million the year before. The tools are better and the problem is worse. That is not a tooling failure. That is a human behavior problem.

Developers know they should not hardcode credentials. Every bootcamp teaches environment variables in the first week. Every company has a security policy that says use a secret manager. And then the deadline hits, or the local dev setup needs a quick test, or the config file is easier to read with the real values in it, and someone types a real API key into a real file and commits it without thinking. Not because they do not know better. Because knowing better and doing better are two very different things under pressure.

The people most likely to make that mistake are the people newest to the craft. The education problem that shapes every beginner's Git experience, the confusion, the skipped steps, the habits formed before anyone explains why they matter, shows up here as a security problem. A developer who does not fully understand what "public" means on GitHub, or what Git history actually preserves, is also the developer most likely to commit a real credential and not realize what they have done.

The GitGuardian report found that four point six percent of all public repositories on GitHub contain at least one secret. More striking, thirty-five percent of private repositories contain plaintext secrets. Developers are even more careless when they think nobody is watching. And if those private repositories are ever made public, or breached, or accessed through a stolen credential like the Uber case, every secret in the history becomes exposed at once.

There is one more number from the report that deserves attention. Repositories where developers used AI coding assistants like GitHub Copilot showed forty percent more secret leaks than repositories without them. The tool that is supposed to make developers more productive is also making them more careless with credentials. Copilot does not know what a real API key looks like versus a placeholder. It autocompletes confidently either way.

That last finding should worry anyone thinking about the next five years of software development. AI-generated code is growing exponentially. If that code comes with a forty percent higher rate of credential exposure, the twenty-three point eight million number from two thousand twenty-four is going to look quaint.

GitHub has responded with AI-powered secret scanning that goes beyond simple pattern matching. The system now uses machine learning to identify credential-shaped strings even when they do not match a known format, catching custom internal tokens, self-hosted service keys, and novel credential types that a regex would miss. On the attacker side, the same AI tools introduce a new angle. A developer asks an AI assistant to write a deployment script. The assistant generates something plausible, fills in placeholder values that look exactly like real credentials, and the developer pastes it without checking whether the values are genuine. Worse, there are documented cases of AI models reproducing strings from their training data. Real API keys that appeared in public code repositories before they were rotated, now surfacing again in AI-generated output. The model does not know the key is real. The developer does not know the key is real. But the scanner bots know within four minutes.

The Permanent Record

There is a philosophical dimension here that goes beyond any specific breach or any annual report. Git was built on the principle that history matters. That every change should be preserved. That you should always be able to go back to any previous state. It is a principle that serves software development extraordinarily well.

But it also means that Git is a permanent record of every mistake you have ever made. Every credential you accidentally committed. Every config file you pushed before remembering to add it to your gitignore. Every moment of carelessness is preserved with a timestamp and your name attached to it.

The defenders are getting better. Push protection, pre-commit hooks, automated scanning. The tools exist to prevent most leaks before they happen. But they require developers to install them, configure them, and not override them when they get in the way. And the numbers suggest that most developers do not do any of those things.

The trust model underneath all of this is worth naming. When you push a credential to a repository, you are implicitly trusting everyone who can read that repository, which, for a public repo, means everyone on the internet. That trust is not verified. Git has no identity layer, no way to confirm that the person cloning your repository is who they claim to be. The credential becomes a master key handed to a room full of strangers, and you do not even know who was in the room. That unverified identity model threads through Git's entire architecture, and it shapes what happens when something goes wrong. When you push a secret to a supply chain package that thousands of projects depend on, the blast radius is not one repository. It is every project that trusted yours. The XZ incident showed exactly how that trust model can be turned into a weapon deliberately, by someone who spent years building the credibility to exploit it.

So the arms race continues. Automated scanners watch every public push, searching for patterns that look like credentials. Automated defenders try to catch those credentials before the scanners do. And in between, millions of developers push code every day, some percentage of them carrying secrets they did not mean to share. Git preserves everything. The bots never sleep. And the clock starts the moment you hit push.

Remember those four minutes? That is the window. That is the margin between a careless commit and a five-figure cloud bill, a data breach affecting millions, or a criminal prosecution. Four minutes. Git never forgets, and sometimes that is the problem.

Git rm dash dash cached. This removes a file from Git's tracking without deleting it from your disk. The file stays right where it is, but Git stops watching it.

This is your first step when you accidentally committed something sensitive. Run git rm dash dash cached followed by the filename, add that file to your gitignore, then commit the change. The file disappears from future commits. But remember what this episode taught you. It is still in the history. For a public repository, you will need tools like git filter repo to truly scrub it. But for catching mistakes before you push, this command is the one you want. That was episode eleven.