Tokens: What This Means for You

The Invisible Tax on Every Keystroke

This is the practical companion to episode one of Actually, AI, tokens.

You have heard the story. A compression algorithm from nineteen ninety-four, repurposed by a linguist in Edinburgh, determines how every AI on Earth reads your text. You know about the strawberry problem, about glitch tokens, about the language tax. Now the question is: what do you do with that knowledge?

Because here is the thing. Every time you type a prompt, you are not just writing a sentence. You are constructing a sequence of tokens. And the way that sequence gets built, the way your words get chopped into fragments, directly affects what comes back. Not in a vague "garbage in, garbage out" way. In a specific, mechanical, predictable way that you can learn to work with.

Why Rephrasing Changes the Answer

You have probably noticed this. You ask an AI a question, get a mediocre response, rephrase the same question slightly differently, and suddenly the answer is dramatically better. This is not random. It is not the AI having a good day. It is tokenization.

When you write "pls summarize this txt abt ML," every abbreviation and shorthand creates a different token sequence than "please summarize this text about machine learning." The abbreviated version might split "pls" into an unexpected fragment. "Txt" is a different token than "text." "Abt" is a different token than "about." "ML" becomes two individual letter tokens rather than the single token for "machine learning." The model processes an entirely different input sequence, activates different internal pathways, and produces a different output.

This is not a metaphor. It is literal. Two prompts that mean the same thing to you mean different things to the model, because they arrive as different sequences of numbers. The model that responds to your carefully written question is, in a mechanical sense, running a different computation than the model that responds to your hasty abbreviation.

The practical lesson is simple. Write clearly. Not because the AI judges your grammar, it does not have opinions, but because clear, standard English produces the most predictable, most well-trained token sequences. The model has seen "please explain how photosynthesis works" millions of times in its training data. It has seen "pls xplain photosynthsis" far fewer times. The common phrasing activates well-worn pathways. The unusual phrasing sends the model into less familiar territory.

This extends to structure. A prompt that says "You are an expert botanist. Explain photosynthesis to a college student. Focus on the light reactions. Use simple language." produces four clear, common token sequences. The model can attend to each instruction distinctly. A prompt that crams everything into one breathless sentence, "explain photosynthesis like a botanist would to a college student focusing on light reactions simply," creates a long chain where the relationships between instructions are harder for the attention mechanism to parse. Not because the AI is confused in a human sense. Because the token-level structure is less clean.

The Language Tax at the Keyboard

If you work in a language other than English, the token episode had bad news for you, and the practical implications are worse than you might think.

Remember that Hindi "namaste" costs four tokens where English "hello" costs one. That ratio is not just about your bill at the end of the month. It shapes your entire experience with the AI. You get fewer words in your context window. The model spends more of its attention capacity just reassembling your fragmented words before it can think about what you said. And because the model saw less Hindi training data, the pathways activated by Hindi tokens are less thoroughly trained.

If you are bilingual and using AI for work, here is a concrete strategy. Write your prompts in English when you can, even if you want the response in another language. A prompt like "Explain quantum entanglement in Hindi, at a high school level" gives the model the instruction in its most efficient language and asks it to produce output in yours. The instruction tokens are compact and well-trained. The generation happens in Hindi, which still costs more tokens, but the model understood your request through its strongest pathways.

This is not a permanent state of affairs. GPT-4o's expanded vocabulary cut Hindi token counts by seventy-one percent. Each generation of models gets better. But for right now, in twenty twenty-six, the gap is real, and working with it rather than against it produces better results.

The Strawberry Problem and Your Expectations

You cannot fix the strawberry problem. No amount of clever prompting will give the AI the ability to count letters in a word, because the letters do not exist at the level the model operates on. The model sees "str," "aw," "berry." It will never see S, T, R, A, W, B, E, R, R, Y. Not with current architecture.

This matters because it tells you something fundamental about what to expect from AI and what not to expect. Any task that requires the model to look inside a token is going to be unreliable. Spelling, character counting, rhyme detection, anagram solving, anything that depends on individual letters rather than words as units. The model can sometimes get these right through memorization, it has seen "strawberry has three R's" enough times now that newer models answer correctly for that specific word, but the underlying limitation has not changed. It will still fail on unusual words, because it is pattern-matching from training data, not actually inspecting characters.

The same logic applies to arithmetic with specific numbers. As we covered in the main episode, some numbers are single tokens and others are split across multiple tokens. The model handles single-token numbers more reliably than multi-token numbers. This is why AI can seem brilliant at complex reasoning and then fail at adding two four-digit numbers. The reasoning happens at the token level, where the model is strong. The arithmetic requires reconstructing numbers from fragments, where the model is weak.

The practical takeaway: use AI for what it is good at, which is token-level pattern recognition, language generation, reasoning across concepts, and synthesis of information. Do not trust it for tasks that require sub-token precision. If you need exact arithmetic, ask the model to write code that does the calculation rather than calculating directly. If you need exact spelling, verify it yourself. Know where the floor is.

Try This Right Now

Here is something you can do in the next sixty seconds that will make tokenization real for you. Open any AI chatbot. Ask it this exact question: "How many times does the letter E appear in the word 'nevertheless'?"

The answer is three. Most models will get it wrong, or get it right for the wrong reasons. Now try "How many times does the letter E appear in the word 'beekeeper'?" The answer is four. Now try a word the model has almost certainly never been specifically asked about: "How many times does the letter A appear in the word 'abracadabra'?" Five. Watch the model struggle, or watch it confidently give you the wrong number.

You are not testing the model's intelligence. You are seeing the boundary between the token world and the character world. The model is smart. It is often astonishingly smart. But it is smart about tokens, not about the letters inside them. Once you see this boundary, you start to notice it everywhere. You start to understand why certain prompts work and others do not. You start to write prompts that work with the grain of the system rather than against it.

The Shape of Your Prompt

Think of prompt writing as a form of token engineering. Not in a jargon-heavy, prompt-template way, but in the sense that the shape of your input determines the shape of your output, and that shape operates at the token level.

Short, clear sentences produce short, well-defined token sequences. The model can attend to each one distinctly. Long, nested sentences with multiple clauses produce token sequences where the relationships between ideas are harder for the attention mechanism to track across long distances.

Putting your most important instruction first matters, because the model's attention to early tokens is generally stronger than its attention to tokens buried deep in a long prompt. If you write three paragraphs of context followed by your actual question, the question arrives in a region of the token sequence where attention has already been partially spent on the context. Leading with the question and following with context often produces better results.

Even formatting matters. When you use bullet points or numbered lists in a prompt, each item tends to tokenize into a clean, separable chunk. The model can process them independently. When you embed the same information in flowing prose, the token boundaries between items are less clear, and the model has to work harder to separate the instructions.

None of this is magic. None of it requires memorizing token tables or counting subwords. It is just the practical consequence of the fact that the AI does not read your words. It reads fragments. And fragments that are clean, common, and well-structured produce better results than fragments that are messy, unusual, and tangled.

What Comes Next

That was the practical companion for episode one. The main story told you what tokens are. The deep dive showed you how the algorithm works. And now you know what it means for the way you use AI every day.

Episode two is about neural networks. The actual machinery that takes those tokens and does something with them. Not a brain, not a flowchart, a machine made of millions of tiny adjustable knobs, each one turned slightly by failure. If tokens are the pieces, neural networks are the thing that makes the pieces mean something.

That was the practical companion for episode one of Actually, AI.