Hallucination: The Confident Wrong Answer

The Verification That Was Not

This is episode five of Actually, AI.

In the spring of twenty twenty three, a New York lawyer named Steven Schwartz needed to research a personal injury case. His client, Roberto Mata, had been hit by a serving cart on an Avianca airlines flight. Routine stuff. The kind of case Schwartz had handled for more than thirty years. But his firm's legal database had been cut off due to a billing error, and he needed case citations fast. So he did what millions of people were doing that spring. He opened ChatGPT and asked it for relevant precedents.

The tool delivered beautifully. Six cases, complete with volume numbers, page citations, and excerpts from judicial opinions. Varghese versus China Southern Airlines. Shaboon versus Egyptair. Petersen versus Iran Air. They looked perfect. They read like real case law. They had the right formatting, the right tone, the right level of legal specificity.

None of them existed.

Not one. Six complete fabrications, with fake quotes from fake judges citing fake precedents in fake rulings. The machine had not retrieved case law. It had generated what case law would look like if these cases were real.

Here is the part that makes this story more than an anecdote about a careless lawyer. Schwartz did try to verify. He went back to ChatGPT and asked whether the cases were real. The system responded confidently.

Upon double-checking, I found the case Varghese versus China Southern Airlines Company Limited, nine twenty five F third thirteen thirty nine, Eleventh Circuit, twenty nineteen, does indeed exist and can be found on legal research databases such as Westlaw and LexisNexis.

The model hallucinated the verification of its own hallucination. It did not check anything. It could not check anything. It generated what a reassuring verification would look like, using the same process that generated the fake cases in the first place.

When the opposing counsel pointed out that the cases did not exist, Schwartz was stunned.

I was operating under the false perception that it could not possibly be fabricating cases on its own. I just was not thinking that the case could be fabricated, so I was not looking at it from that point of view.

Judge Kevin Castel was less sympathetic. He described the submissions as containing bogus judicial decisions with bogus quotes and bogus internal citations, and sanctioned both attorneys five thousand dollars each.

That word, "bogus," tells you something about how even a federal judge struggled with what had happened. The cases were not wrong in the way a misremembered citation is wrong. They were fiction that looked indistinguishable from fact. And the system that produced them did not malfunction. It did exactly what it was designed to do.

The Word We Chose

We call this "hallucination," and the word itself is part of the problem.

When you say a person hallucinates, you mean they are perceiving something that is not there. A visual disturbance. A sensory error. The implication is that there is a normal state, a healthy perception of reality, and the hallucination is a departure from it. Something has gone wrong in the system.

That framing, applied to AI, is deeply misleading. It suggests the model has a "correct mode" where it tells the truth and a "broken mode" where it hallucinates. As if there is a truth engine inside the machine that occasionally misfires.

There is no truth engine. There never was.

A language model generates text one token at a time by asking a single question: given everything so far, what is the most likely next piece of text? It does this whether the output is factually correct or completely fabricated. The process is identical. The model has patterns learned from training data, patterns that usually produce correct output because the training data was mostly correct. But the model treats "the capital of France is Paris" and "the capital of France is Milwaukee" as two sequences with different probabilities, not as "fact" and "fiction." It has confidence scores, not truth values.

This distinction matters enormously, and the word "hallucination" obscures it. In twenty twenty three, biologist Carl Bergstrom and Yale ecologist Brandon Ogbunu published a piece arguing that the right term is not hallucination at all. It is something older and less flattering.

ChatGPT is not behaving pathologically when it claims that the population of Mars is two point five billion people. It is behaving exactly as it was designed to. The bullshit is baked into the design of the technology itself.

They drew on philosopher Harry Frankfurt's framework from his two thousand five book On Bullshit. A liar knows the truth and tries to lead you away from it. A bullshitter does not know or care about the truth one way or the other. The output is shaped by what sounds convincing, not by what is accurate. By that definition, a language model is perhaps the purest bullshitter ever constructed. It produces text optimized for plausibility with zero mechanism for verifying truth.

Other researchers prefer "confabulation," borrowed from neuropsychology. When patients with certain kinds of brain damage produce false memories, they do it with complete confidence and no intent to deceive. They are not lying. They genuinely believe what they are saying. The gap in their memory gets filled with something plausible, and they cannot tell the difference between the fabrication and a real memory. Andrew Smith, Felix Greaves, and Trishan Panch argued in a twenty twenty three paper that this is the closer parallel. The model is not seeing things that are not there. It is making things up to fill gaps, and it has no mechanism to recognize that what it made up is wrong.

And then there is linguist Emily Bender, who pushed back on all of these framings. Calling it hallucination implies perception. Calling it bullshit implies an agent that could care about truth but does not. Calling it confabulation implies a memory system that is broken. None of these apply. The model has no perception, no agency, no memory in the relevant sense. Bender preferred a blunter description: synthetic text extruding machines. Not flattering, but hard to argue with.

Here is where the narrator takes a side. "Hallucination" is the worst of the available options, and it is the one that won. It is the worst because it does the most work for the people who build these systems. It implies a healthy baseline that occasionally glitches. It frames the problem as a bug to be fixed rather than a feature of the architecture. And it borrows just enough from human cognition to make the machine sound like a mind that is temporarily confused, rather than a prediction engine that has no concept of truth at all.

What Actually Goes Wrong

So what is actually happening when a model generates false information?

Go back to episode three, to how training works. The model saw billions of text sequences and learned statistical patterns. When those patterns are strong and consistent, the outputs tend to be correct. Ask for the capital of France and you get Paris, because the pattern "capital of France" followed by "Paris" appeared overwhelmingly in the training data.

But the model has no way to distinguish between patterns that correspond to facts and patterns that merely sound factual. It learned that legal citations follow a specific format: a case name, a volume number, a reporter abbreviation, a page number, a court, a year. It learned this pattern so well that it can generate citations that pass visual inspection by a thirty year veteran lawyer. The format is a pattern. The content is a pattern. Whether those patterns correspond to something real in the world is a question the model cannot ask.

Adam Kalai, a researcher at OpenAI, and Santosh Vempala at Georgia Tech published a mathematical proof in twenty twenty three showing that this is not fixable through better training alone. For a specific category of facts, the ones that appear rarely in training data, hallucination is statistically inevitable. Their proof connects to something called the Good-Turing estimator: the rate at which a model will hallucinate rare facts is roughly proportional to the fraction of facts that appeared only once during training. The model learned the pattern well enough to generate plausible output but not well enough to get the content right. And no amount of additional training on the same data changes this math.

Kalai demonstrated this with a simple example. He asked a language model for his own birthday. Across three attempts, it gave three different wrong answers. It was not confused. It was doing exactly what it always does: generating the most likely next token given the context. The context said "Adam Kalai's birthday is" and the model produced a plausible date. A different plausible date each time.

This is why asking the model to double-check itself does not work the way you expect. When you say "are you sure about that?" the model does not go back and verify. It generates what a confident verification would look like, using the same statistical process. Sometimes this corrects the error, because "actually, I was wrong" is also a strong pattern in the training data. But the correction is not coming from a truth-checking mechanism. It is coming from the same prediction engine, pointed at a different prompt.

The Hundred Billion Dollar Typo

If this were only about fake legal citations, it would be a cautionary tale for lawyers. But months before the Mata case made headlines, Google had already demonstrated that the consequences scale with the stakes.

In February twenty twenty three, Google publicly demoed Bard, its answer to ChatGPT, in a promotional video. Someone asked what new discoveries from the James Webb Space Telescope they could tell their nine year old about. Bard responded confidently. Among its answers: the James Webb Space Telescope took the very first pictures of a planet outside our own solar system.

Wrong. The European Southern Observatory's Very Large Telescope took the first exoplanet image in two thousand four, nearly two decades earlier.

One incorrect claim about a telescope, in a demo video that was supposed to show the world Google's AI could compete with OpenAI. Alphabet's stock dropped seven point seven percent the next day. Roughly one hundred billion dollars in market value, erased because a language model predicted the wrong next token and nobody caught it before the video went live.

The model was not broken during that demo. It was performing exactly as it performs every other time. Generating plausible continuations. The James Webb Space Telescope is frequently discussed alongside exoplanet discoveries. The phrase "first picture of an exoplanet" exists in many training texts. The model stitched these patterns together into a sentence that sounded authoritative and happened to be false. From the model's perspective, there was no error. There was a probability distribution over possible next tokens, and it sampled from the high-probability region.

The Problem That Does Not Go Away

Here is the honest version of where things stand. Hallucination is not a bug that a future software update will fix. It is a consequence of the architecture. A system that generates text by predicting the next most likely token has no mechanism for truth, the same way a calculator has no mechanism for poetry. You can build truth-checking systems around it. You can feed it verified documents and tell it to stick to those. You can train it with human feedback to express uncertainty. These mitigations help. They reduce the rate. They make the confident wrong answers less frequent.

But they do not eliminate the fundamental problem, because the fundamental problem is not a flaw in the implementation. It is a feature of what the system is. A prediction engine that produces text which sounds like it was written by someone who knows what they are talking about. Sometimes it was. Sometimes it was not. And the system itself cannot tell the difference.

That is the most important thing to understand about AI hallucination, and it is exactly what the word "hallucination" prevents you from understanding. The machine is not seeing things. It is not confused. It is not broken. It is doing precisely what it was built to do. The uncomfortable truth is that the same mechanism that produces brilliant, insightful, genuinely useful text also produces complete fabrications, and there is no switch between the two modes. There is only one mode. Always has been.

That was episode five. The deep dive goes further into the taxonomy of hallucination types, the counterintuitive finding that bigger models can be less truthful, and the philosophical debate about whether we are even asking the right question. Find it right after this in your feed.