The Security Reckoning: The Code That Works but Is Not Safe

The Confidence Gap

This is episode nineteen of Git Good, Season Two.

In the spring of two thousand twenty-five, Veracode published the results of the largest security audit of AI-generated code ever conducted. They tested over one hundred large language models on common programming tasks in Java, Python, C sharp, and JavaScript. Not toy prompts. Real tasks. The kind of code that ends up in production. Authentication handlers. Database queries. File parsers. The stuff that, if it goes wrong, lets someone walk right through the front door.

Forty-five percent of the code failed security tests. Nearly half. Not obscure edge cases, not theoretical weaknesses. OWASP Top Ten vulnerabilities. Cross-site scripting. Log injection. SQL injection. The same categories of flaws that security teams have been fighting for twenty years, generated fresh and fast by machines that write code with absolute confidence and zero understanding of what the code does.

Here is the number that defines this episode. Eighty percent of developers surveyed said they believe AI-generated code is at least as secure as code they would write themselves. The actual security pass rate, across all models and languages, was fifty-five percent. Eighty percent confidence. Fifty-five percent reality. That is not a gap. That is a canyon. And the code is pouring into Git repositories at a rate that would have been unimaginable three years ago.

The security researchers at Georgia Tech and the Cloud Security Alliance gave it a name. They call it the AI security debt. Not technical debt, the familiar problem of shortcuts accumulating over time. Security debt. The specific, measurable accumulation of vulnerabilities introduced faster than anyone can find and fix them. The deficit is growing every day, in every repository where AI-generated code is being committed, and it is being committed almost everywhere.

This episode is about two things that are happening at the same time. The first is the accumulation of security debt at a pace nobody has seen before. The second is the institutional response, the regulations and requirements and compliance frameworks that governments are building to contain it. Both stories run through Git. The vulnerabilities live in Git repositories. The audit trail regulators need is Git metadata. The tension between the two is the story of this episode.

The Numbers That Should Scare You

The Cloud Security Alliance has been tracking this since May of two thousand twenty-five, when they launched something called the Vibe Security Radar in partnership with Georgia Tech. The project does one thing. It systematically tracks vulnerabilities that can be attributed to AI coding tools in open source repositories. Not speculation. Not surveys. Actual CVEs, the numbered vulnerability entries that the security industry uses to catalog confirmed flaws.

In January of two thousand twenty-six, they found six. In February, fifteen. In March, thirty-five. A near sixfold increase in two months. The total across the project's tracking lifetime reached seventy-four confirmed CVEs directly attributable to AI coding tools.

Our tracking captures the visible fraction. The true count of AI-generated code vulnerabilities in the open-source ecosystem is estimated at four hundred to seven hundred cases, roughly five to ten times the detected figure, because most AI coding tools do not leave identifiable metadata in their commits.

That last point is the one that matters most for this series. Git records who committed the code. Git records when. Git records the message. But Git has no field for "this was generated by a machine." The author line says a human name. The signature, if there is one, belongs to a human key. The vulnerability hides behind a fiction of authorship, and the tools tracking it have to rely on indirect signals to attribute it at all.

The individual numbers paint a picture that is worse than the aggregate. AI-assisted developers commit code three to four times faster than their peers. That sounds like a productivity miracle. But they introduce security findings at ten times the rate. Not three to four times, matching the speed increase. Ten times. The security debt accumulates faster than the code itself.

And the type of vulnerability is shifting. Syntax errors, the mistakes that stop code from compiling at all, are down seventy-six percent in AI-assisted repositories. The code runs. It passes basic tests. It looks clean. But privilege escalation paths, the flaws that let an attacker move from a low-access foothold to full system control, are up three hundred and twenty-two percent. The code is more polished and more dangerous at the same time. A developer scanning through a pull request sees clean syntax, consistent style, no obvious bugs. The vulnerability is not in what the code does wrong. It is in what the code quietly allows.

Episode eleven of this season told the story of secrets in Git history, the millions of credentials exposed on GitHub every year, the bots that find them in four minutes. That problem is getting worse too. AI-assisted commits leak secrets at twice the baseline rate. But leaked credentials are at least detectable. A scanner can find a pattern that looks like an API key. A privilege escalation vulnerability looks like normal code. It is designed to look like normal code. That is what makes it a privilege escalation.

The false confidence is the real danger. Syntax errors are the mistakes you notice. They stop the build. They throw red warnings. They force you to look. When those go down by seventy-six percent, the developer's experience improves dramatically. The code compiles on the first try. The tests pass. The pull request looks clean. Everything feels more professional, more polished, more secure. And that feeling is a lie. The code that compiles cleanly and passes tests while containing a privilege escalation path is more dangerous than the code that fails to compile at all. The broken code gets fixed. The working-but-vulnerable code gets shipped.

The Phantom Dependency

In April of two thousand twenty-five, a security researcher named Seth Larson coined a word. Larson was the Python Software Foundation's developer-in-residence, which meant he spent his days thinking about the security of the Python package ecosystem. The word he invented was slopsquatting.

AI models hallucinate package names. They recommend libraries that do not exist, with names that sound plausible, and they do it consistently. If you ask the same question again, you get the same fake package. That consistency is what makes it exploitable.

The name is a portmanteau. AI slop, the industry term for confidently wrong machine output, combined with typosquatting, the old trick of registering a package name that is one letter off from a popular library. Slopsquatting is what happens when the AI invents the fake name for you.

Researchers analyzed over half a million code samples generated by sixteen different AI models, including GPT-4, CodeLlama, DeepSeek, and Mistral. Roughly one in five recommended at least one package that did not exist. Not a deprecated package. Not an alternative spelling. A name the model invented from nothing, for a library that had never been written.

That alone would be a nuisance, not a security threat. What turns it into a weapon is the consistency. When the researchers ran the same prompts ten times each, forty-three percent of the hallucinated packages appeared every single time. Not random noise. A reliable, repeatable recommendation for something that does not exist. And thirty-eight percent of the fake names had moderate string similarity to real packages, meaning they sounded like they could be real, the kind of name you would not think twice about adding to your requirements file.

The attack writes itself. An attacker prompts a dozen AI models with common coding questions. They collect the fake package names that appear consistently. They register those names on PyPI or npm. They fill them with code that works, mostly, plus a small payload that phones home with credentials or installs a backdoor. Then they wait. They do not need to find victims. The AI sends the victims to them.

It already happened. A package called huggingface-cli, a name that sounds like it should be a real tool from the Hugging Face machine learning platform, was hallucinated by multiple AI models, registered on PyPI by someone, and downloaded over thirty thousand times in three months. Another, called react-codeshift, a hallucinated mashup of two real JavaScript tools, spread across two hundred and thirty-seven GitHub repositories through AI-generated code before anyone noticed.

Think about what this means for a Git repository. A developer asks an AI to help with a task. The AI suggests a dependency. The developer adds it to their requirements file, commits, pushes. The dependency gets installed in CI. If it is malicious, it now has access to the build environment, the deployment credentials, the secrets that episode eleven described leaking. And the commit that introduced it looks completely normal. A human name in the author field. A reasonable commit message. A requirements file with one new line. Nothing to flag.

The traditional supply chain attack required effort. You had to compromise an existing package, or find a maintainer to social-engineer, or register a typosquatting name close enough to a popular library that someone might mistype it. Slopsquatting inverts the economics completely. The AI does the social engineering for you. It recommends fake packages to real developers, repeatedly, consistently, without anyone having to trick anyone. The attacker just needs patience and a package registry account.

And the scale is different from anything that came before. A typosquatter might catch a few hundred developers who mistype a package name. A slopsquatter catches every developer who asks an AI for help with a common task and trusts the answer. As AI coding tools become the default way people discover and install dependencies, the attack surface grows with every prompt. The more people use AI to write code, the more valuable slopsquatting becomes.

The Four Perspectives

For the solo developer working alone on a side project, security has always felt like someone else's problem. Who would hack my personal website? Who would target my weekend Python script? The answer, it turns out, is nobody, and also everybody. Nobody is targeting you specifically. But the bots scanning every public repository do not know or care that your project has three users. They scan everything. And if your AI-generated code introduced a dependency that phones home, or committed a cloud credential that the AI helpfully included as an example, you are in exactly the same four-minute window as everyone else.

For the vibe coder, the person building with AI who has never written code by hand, the security problem is even more fundamental. This is the developer from episode eighteen who types a description of what they want and lets the machine handle everything. They do not know what a vulnerability is. Not in the sense that they have not studied security. In the sense that the concept is not part of their mental model. Code either works or it does not. If it works, it is done. The idea that code can work perfectly for its intended user while simultaneously being wide open to an attacker is foreign. And nothing in the AI tool's interface tells them otherwise. The code is generated. The tests pass. The commit is made. The push succeeds. Every signal says success.

For the team developer inside an organization, this is becoming a compliance nightmare. The tooling that was supposed to make them faster is generating code that requires more review, not less. More security scanning. More dependency auditing. More time spent verifying that the AI did not introduce a vulnerability or a phantom dependency. The three to four times speed increase comes with a ten times increase in security findings that someone has to triage. The math does not work.

And for the open source maintainer, already drowning in AI-generated pull requests as episode seventeen described, the security dimension adds another layer of exhaustion. Every PR from an unknown contributor might contain AI-generated code with vulnerabilities the submitter does not understand. The maintainer has to review not just for correctness and style, but for security flaws that are invisible to casual inspection. The privilege escalation vulnerability that looks like clean code. The dependency that looks legitimate but does not exist. The credential that looks like a placeholder but is live.

The maintainer cannot ask "did AI write this?" and get an honest answer. Nothing in the commit requires disclosure. Nothing in the pull request template forces it. Some contributors mention it. Most do not. The maintainer is left to guess based on subtle signals. Perfect formatting. Slightly generic variable names. A commit message that reads like a prompt response. Comments that explain obvious things and miss the non-obvious ones. But none of these signals are reliable, and reviewing code for security vulnerabilities you cannot see is exhausting in a way that reviewing code for bugs is not. A bug breaks something. A vulnerability waits. The burden falls on the person least resourced to carry it.

The Letter of the Law

In the spring of two thousand twenty-three, while AI coding tools were still a novelty and slopsquatting had not been named yet, Deb Nicholson sat down to write a blog post she hoped would not be necessary. Nicholson was the executive director of the Python Software Foundation, the nonprofit that oversees the language used by millions of developers, the foundation that also runs PyPI, the Python Package Index. Over four hundred thousand packages. Billions of downloads per month. A small team on a modest budget.

The story did not start in Europe. It started in the United States, on a Friday afternoon in May of two thousand twenty-one, when President Biden signed Executive Order fourteen zero two eight. The title was Improving the Nation's Cybersecurity, and it was a direct response to two incidents that had rattled the government. The SolarWinds breach, in which Russian intelligence compromised a widely used network management tool and used it to infiltrate federal agencies. And the Colonial Pipeline ransomware attack, which shut down fuel supplies across the southeastern United States for nearly a week.

The executive order required that any company selling software to the federal government provide a software bill of materials. An SBOM. A formal record of every component inside the software, every library, every dependency, all the way down. Think of it as a nutrition label for code. Not calories and sodium, but every piece of software baked into the product, who made it, what version, where it came from.

Europe looked at that and said: we will go further. We will require it from everyone.

The European Union proposed the Cyber Resilience Act in September of two thousand twenty-two. Where the American executive order applied only to government procurement, the CRA would apply to every product with digital elements sold in the European market. Not just government vendors. Everyone. Hardware manufacturers, software companies, anyone putting a connected device or a software product into European hands. Companies would be liable for vulnerabilities. Fines up to fifteen million euros or two and a half percent of global annual turnover.

Nicholson read the draft and saw a problem. The text drew no clear line between a corporation selling a product and a nonprofit hosting a public repository.

The risk of huge potential costs would make it impossible in practice for us to continue providing Python and PyPI to the European public.

She was not exaggerating. If the Python Software Foundation could be held liable for any product that included Python code, the rational response was not to write more secure software. It was to stop distributing software to Europe entirely. The law would not create better security. It would create less software.

The fight over the open source exemption lasted eighteen months. The Open Source Initiative, the Eclipse Foundation, the Linux Foundation, and dozens of smaller organizations mobilized. They did not argue against regulation. They argued that regulation needed to understand how open source works. A company that bundles open source code into a commercial product is the one making the commercial decision. The liability should follow the money, not the code.

The European Parliament listened. Slowly. The final text of the Cyber Resilience Act, published in December of two thousand twenty-four, included a significant concession. Free and open source software that is not monetized by its developers is generally exempt from the regulation.

Generally. That word carries enormous weight. Because the exemption has edges, and those edges are where the problems live. The concession also created a category that had never existed in any regulatory framework anywhere in the world. The open source steward. A foundation. Not a manufacturer. Not an uninvolved bystander. Something in between. The Apache Software Foundation. The Python Software Foundation. The Eclipse Foundation. The Rust Foundation. Organizations that do not sell software but systematically support the development of software used commercially. Until December of two thousand twenty-four, there was no legal name for what they do.

Stewards get a lighter burden than manufacturers. But they still have obligations. They must document a cybersecurity policy. They must cooperate with authorities. And starting in September of two thousand twenty-six, they must report actively exploited vulnerabilities and severe incidents.

The Compliance Engine Nobody Planned

This is where the story circles back to Git. Because when regulators ask for an audit trail, a verifiable record of who wrote what code, when it was reviewed, whether it was approved, they are describing something that already exists. They are describing git log.

Every commit records an author, a date, a message, and a cryptographic hash chaining it to every commit before it. Every pull request records who reviewed the code, what comments were made, when it was merged. The entire development history of a project is already there, baked into repository metadata that most developers never think about.

The SBOM requirement makes this concrete. A software bill of materials is a formal inventory of every component inside a piece of software. Every library, every dependency, every transitive dependency. The American executive order from two thousand twenty-one required them for government procurement. The European CRA will require them for everything sold in Europe.

And an SBOM is not something you bolt on after the fact. It is something you extract from files that already live in a git repository. The package dot json. The requirements dot txt. The Cargo dot toml. The go dot mod. Each one is a versioned file with a commit history showing exactly when each dependency was added, who added it, and what version they pinned. GitHub already provides an API endpoint that generates an SBOM directly from a repository's dependency graph. Atlassian built an internal platform that generates SBOMs from incoming commits, and within a year it had produced over a million of them covering one point eight billion package entries. The infrastructure was there before anyone asked for it.

What makes Git's role in the SBOM story so interesting is that it was never designed for this. Dependency files are just files. Git tracks them the way it tracks any other text file, with the same content-addressed snapshots, the same immutable history, the same cryptographic hashes. But the information regulators need, the when and who and what-version of every dependency change, falls out of git log as a side effect of how Git works. The audit trail is accidental. It is also, as of two thousand twenty-six, exactly what the law requires.

Then there are signed commits, the feature episode thirteen explored in the context of identity. Only about ten percent of commits are signed today. Most developers never bothered because there was no reason to. A signed commit is a cryptographic attestation that a specific person, verified by a specific key, authored a specific change at a specific time. It is not just a log entry. It is evidence. The feature that almost nobody used is becoming the feature regulators might require. The motivation is not security consciousness. It is compliance.

Mike Milinkovich, the executive director of the Eclipse Foundation, saw this coming. In April of two thousand twenty-four, he brought seven foundations together. Apache. Blender. OpenSSL. PHP. Python. Rust. Eclipse. Their goal was to build common specifications for secure software development before the regulation became fully enforceable.

The regulation comes into force in two thousand twenty-seven, and the open source community needs to build the processes and specifications that will allow us to comply without destroying the collaborative development model that makes open source work.

The argument these foundations are making is subtle but important. Open source already does most of what the CRA requires. Coordinated vulnerability disclosure. Peer review. Release processes. It just does it informally. The challenge is not to invent new security practices. It is to document existing ones in a way that satisfies a regulator who needs paper.

The Gray Area and the Small Project

Both sides of this story are right, and that is what makes it hard.

The regulations are necessary. The SolarWinds breach happened because nobody checked what was inside the software. Supply chain attacks are accelerating. Episode eleven documented the scale of credential exposure. Episode thirteen showed how easily Git identities can be spoofed. The internet runs on software that nobody audits, maintained by people nobody pays, with security practices that range from rigorous to nonexistent. Something had to change.

But the regulations were designed for Microsoft and Siemens and Samsung. Companies with legal departments and compliance teams and budgets measured in billions. The open source exemption and the steward category were supposed to shield smaller projects. And for projects under a major foundation, they probably will.

But thousands of widely used projects sit in a single developer's GitHub account. That developer might accept fifty dollars a month through GitHub Sponsors. They might do consulting related to the project. They might have a day job at a company that uses it commercially. Any of those activities could, depending on how implementation guidance is written, push their project from the exempt category into a commercial activity and trigger the full compliance requirements.

A solo developer with no legal team cannot generate SBOMs, document cybersecurity policies, file vulnerability reports with European authorities, and undergo conformity assessments. The cost is not just financial. It is time. Hours spent on compliance are hours not spent fixing bugs or writing the code that made the project valuable in the first place.

The irony of scale cuts in both directions. The regulations exist because software is critical infrastructure. But the people maintaining much of that critical infrastructure are solo developers and small teams who cannot absorb regulatory overhead. The same person the CRA is trying to protect, the European consumer using software with unknown vulnerabilities, is also depending on software that might disappear from European package registries if the compliance burden is too high. Regulation intended to make the supply chain more secure could shrink it instead.

The implementation guidance, still being drafted as of early two thousand twenty-six, will have to answer these boundary questions. And the answers will determine whether thousands of small projects continue to operate or stop distributing in Europe entirely.

And then there is the question that nobody in two thousand twenty-one, when the American executive order was signed, could have anticipated. What happens when the code is not written by a person at all?

The Liability Gap

The CRA requires traceability. Who wrote this code? When? Was it reviewed? For human-authored code, git provides clear answers. The author field. The committer field. The signed key. But for AI-generated code, those fields become fiction. The author line says the developer's name. The signature belongs to the developer's key. But the developer did not write the code. They prompted it. They may not have read every line. They approved the pull request.

If that code contains a vulnerability, and at a forty-five percent failure rate it will, who is liable under the CRA? The developer whose name is on the commit? The company that employed them? The AI company whose model generated the code? The platform that trained the model on open source repositories without paying the maintainers who wrote the training data?

The regulation does not answer this. It was written before AI code generation became widespread. The American NIST Secure Software Development Framework, published in two thousand twenty-two, organizes secure development into four practice groups. Prepare the organization. Protect the software. Produce well-secured software. Respond to vulnerabilities. Every federal vendor must attest that they follow these practices. The version one point two draft acknowledges that AI-generated code exists, but it does not assign liability differently. The audit trail that git provides, the very trail that makes git valuable for compliance, assumes that the author field means what it says. That assumption is eroding faster than any regulation can adapt.

We built package ecosystems on trust. Trust that the package name refers to real software. Trust that the author is who they claim to be. Trust that the code does what it says. AI is eroding all three simultaneously, and we do not have a replacement for any of them.

For the solo developer, this is abstract. For the vibe coder, it is invisible. For the team developer, it is a legal risk they cannot quantify. And for the maintainer, it is yet another weight added to shoulders that episode seventeen showed are already breaking.

The tool built to avoid bureaucracy has become the bureaucracy's most important artifact. The commit history that was designed so Linus Torvalds could trust kernel contributors is now the audit trail that European regulators will use to verify compliance. The signed commits that almost nobody bothered with are becoming mandatory infrastructure. The SBOM that nobody had heard of five years ago is being extracted from files that have been sitting in git repositories since the beginning. Git was supposed to be the opposite of process. It was supposed to be freedom.

And here is the deepest irony. The security debt being generated by AI coding tools is creating the very conditions that justify the regulation. Every phantom dependency, every privilege escalation path, every vulnerability disguised as clean code strengthens the case for oversight. The regulation then creates compliance burdens that AI tools promise to help with. Generate the SBOM. Automate the vulnerability report. Scan the dependencies. And those same AI tools, helping with compliance, are generating the same kinds of vulnerabilities that triggered the regulation in the first place. It is a circle, and Git is at the center of it, recording everything, proving nothing, holding it all together because nobody built anything better.

The eighty percent who believe AI code is secure are not stupid. They are responding to real signals. The code compiles. The tests pass. The syntax is clean. The commit looks professional. Every visible indicator says this is good code. The invisible indicators, the ones that matter, the privilege escalation paths and the phantom dependencies and the credentials baked into examples, do not announce themselves. They sit in the git history, technically auditable, practically invisible, waiting for a bot or a regulator or an attacker to find them.

That was episode nineteen of Git Good, Season Two. The security reckoning is not coming. It is here. The code works and it is not safe, the regulations are necessary and they are crushing, the liability gap is widening, and somewhere in between, in the metadata of a commit that nobody will read, the future of software is being decided. Next episode, we follow the money.

Git log with the show-signature flag. It does the same thing as verify-signatures but shows the full GPG output for each signed commit instead of a one-line summary. Run it on any repository and count the signed commits versus the unsigned ones. Right now that ratio probably embarrasses you. In a year, your compliance team might require it to be one hundred percent. The irony: the feature Linus built so the kernel could verify patches is about to be mandated by the same kind of bureaucracy he designed Git to escape.