Zero Percent: The AI That Could Not Say Yes or No

The Plan

Picture a heist movie. A crew sits around a table in a dimly lit room, blueprints spread out, coffee going cold. The target is not a bank vault. The target is the AI bill.

Here is the setup. You run multiple AI projects. Each one uses a different model at a different price point. The heavy hitter, Claude Opus, costs about four point four cents per call. The mid-range options run a cent or two. The budget model, Meta Maverick, comes in at three hundredths of a cent. And right now, a human decides which model handles which task. Every single time. Manually. Like a switchboard operator in nineteen forty seven.

The idea is beautiful in its simplicity. What if you could replace the human router with a tiny AI model? One that runs locally, costs nothing, and makes the decision in milliseconds. A free traffic cop directing expensive workers. The savings would compound with every call. The human goes home. The system runs itself.

The model they chose for the job was LiquidAI's LFM two point five, the one point two billion parameter variant. A "thinking" model, meaning it reasons through problems step by step before answering. One point two eight gigabytes of memory. Small enough to fit on a laptop. And it runs at ninety seven tokens per second on Apple Silicon. That is fast. That is very fast.

The budget for this operation? Zero dollars. Everything local. No API calls. No cloud. No credit card on file. The crew is assembled. The plan is drawn. The getaway car is running.

Test One: The Speed Check

Every heist starts with something easy. A test run. Case the joint. Make sure the hardware works.

Ninety seven tokens per second. The model loaded in under two seconds. Inference was nearly instant. On a laptop, without a GPU, this tiny model processed text faster than most people can read it. First checkpoint cleared. Green light across the board.

Confidence was high. If the model could think this fast, surely it could handle a simple routing decision. The question is never whether you can get into the vault. The question is whether you can get out.

Test Two: Yes or No

The second test was the simplest possible task. Binary classification. Given a piece of content, answer one question. Is this user-facing? Yes or no. That is it. One word. One bit of information. The most basic decision a router could make.

Eighty attempts. Eighty chances to say yes or no. The instructions were explicit. Answer only yes or no. Nothing else. Do not explain. Do not elaborate. Just the word.

The model began thinking. It reasoned about the nature of user-facing content. It considered edge cases. It weighed the philosophical implications of the term "user-facing." It explored the relationship between content and audience. It thought and thought and thought. And while it was still thinking, the token limit ran out.

Zero percent. Not a single correct answer out of eighty attempts. Not because the model got the answers wrong. Because it never reached an answer at all. The thinking model was so busy thinking that it forgot to speak. Like a safecracker who spends so long studying the lock that the police arrive before the dial moves.

The chain-of-thought reasoning, the very feature that made this a "thinking" model, consumed fifty to one hundred tokens of deliberation before producing any output. For a task where the entire answer is one token long, the model spent its whole budget on the prelude.

Test Three: Pick a Model

Surely, the crew thought, a more structured task would help. Instead of yes or no, give the model options. Here are six AI models. Here is a task description. Which model should handle it? Multiple choice. Even a coin flip would hit seventeen percent.

Twenty tasks. Six possible answers. The model got three right. Fifteen percent accuracy. Barely above random chance, and every correct answer was the same easy case. Bulk processing tasks routed to the cheapest model. The one obvious answer that required no nuance at all.

For everything else, the model flailed. And then something remarkable happened. When asked to decide which model should write a Swedish product description, the tiny model did not choose a model. It tried to write the Swedish description itself. It confused the meta-task with the task. "Choose who does this job" was too abstract. The model heard "Swedish description" and started describing.

This is the moment in the heist where someone trips the laser grid because they were looking at the diamonds instead of the floor.

Test Four: The Filter

The fourth test was a pre-filter. Before routing anything, just answer one question. Should this item be processed at all? Some items are spam, duplicates, or irrelevant. A useful router needs to skip the junk and pass through the real work.

The model said skip. Skip everything. Every single item. Including four real articles that absolutely needed processing. It was not filtering. It was refusing.

Zero percent. Again. The second perfect failure. At this point in the heist movie, the alarm is blaring, the vault door is stuck, and the getaway driver has left to get coffee.

Test Five: Just Give Me JSON

One test remained. Forget routing. Forget classification. Forget nuance. Just produce structured output. Take this input, return it as a JSON object. Keys and values. Curly braces. The format that every API on earth speaks.

The model never produced valid JSON. Not once. Not a single parseable object. The output was a stream of reasoning about what JSON should look like, philosophical musings on data structure, and occasionally something that resembled a key-value pair embedded in a paragraph of natural language.

Zero percent. For the third time. The heist is over. The crew is sitting in the parking lot, staring at each other. The vault was not just locked. It was painted on the wall.

The Keyword Baseline

Here is where the twist lands. While the AI model was scoring fifteen percent on its best day and zero on its worst, someone wrote a keyword matcher. About twenty lines of Python. If the text contains Swedish words, route to Opus. If it mentions images, route to the vision model. If it is a bulk task, route to Maverick. Simple rules. Pattern matching. The kind of code a junior developer writes on a Monday morning before the coffee kicks in.

Ninety percent accuracy. It missed two out of twenty tasks. The twenty lines of if-statements outperformed the one point two billion parameter thinking model by a factor of six. Zero cost. Zero latency. Fits in a tweet.

The experiment started as a question about whether AI could replace human routing. The answer was not just "no." The answer was that the best router is not even AI. It is a switch statement. The heist was unnecessary. The vault was unlocked the whole time. Someone just needed to try the handle.

The Pattern

This is not an isolated incident. Three independent data points from the same research lab confirm the same pattern.

Weather broadcasts. Fixed format, predictable structure, the same template every time. Rule-based generation is perfect. One hundred percent accuracy. Every time. An AI model would introduce variation where none is wanted, creativity where precision is required.

A dashboard aggregating thirty seven data sources. Pure data transformation. Numbers in, formatted numbers out. No interpretation needed. The rule-based system handles fifty API calls per minute at zero cost. An AI model would cost money to do worse work.

And this routing experiment. The most dramatic failure of the three, because it started with the highest ambitions.

The pattern is this. If you can enumerate the output space, use rules. If you can write down every possible answer before you see the input, you do not need a model. You need a lookup table. AI is for the gaps. For the tasks where you genuinely do not know what the output should look like until you see the input. Translation. Summarization. Creative generation. Tasks where the output space is vast and unpredictable.

But routing? Routing is a solved problem. It has been a solved problem since the first telephone switchboard. You have a finite number of destinations and a set of signals that tell you where each call should go. A switch statement handles this elegantly. A neural network handles it like a philosophy professor handles a light switch, with great deliberation and occasional success.

The Question You Should Be Asking

Here is what makes this story useful beyond a cautionary tale about tiny models. The one point two billion parameter model failed because the task did not need intelligence. It needed obedience. Follow a rule. Produce a word. Do not think. The model could not stop thinking. That was the fundamental mismatch.

But this mismatch is not limited to tiny models. It scales up. Right now, somewhere, someone is paying four cents per API call to have a large language model do something that a regular expression could do. Someone is running GPT-4 to extract dates from emails. Someone is using Claude to validate JSON schemas. Someone is spending thousands of dollars a month on AI-powered string matching.

The zero percent score is funny. It is genuinely, absurdly funny. A thinking model that thinks so hard it forgets to answer. But the real joke is broader. The real joke is every production system where someone reached for AI because it felt modern, because it felt smart, because it was the tool they had just learned to use, when the actual problem was a lookup table wearing a trench coat.

Ask yourself one question about your own AI integrations. Can I write a switch statement instead? If the answer is yes, the switch statement wins. It always wins. Zero cost. Zero latency. Zero percent chance of getting confused and writing a Swedish product description when you asked it to pick a number between one and six.

The tiny model is not the villain of this story. It did exactly what a one point two billion parameter model should be expected to do, which is very little. The villain is the assumption that intelligence is required. Sometimes the answer is not smarter AI. Sometimes the answer is no AI at all.