Neural Networks: What the Numbers Do Not Tell You

This is the practical companion to episode two of Actually, AI, neural networks.

The Number Everyone Quotes

You have heard it. Seventy billion parameters. Four hundred five billion parameters. A trillion parameters. The number gets dropped into press releases and product announcements like a speedometer reading, and the implication is always the same. Bigger number, smarter machine. A seventy billion parameter model must be better than a seven billion parameter model, the way a seven hundred horsepower engine is faster than a seventy horsepower engine. And if you heard the main episode and the deep dive, you already know why that framing is wrong. But knowing it is wrong in theory and knowing what to do about it in practice are two very different things.

So here is the practical version. What does the parameter count actually tell you, what does it hide, and how should you think about choosing the right AI tool for what you are trying to do?

What Seventy Billion Actually Means

Remember the mixing board from episode two. Every parameter is one slider on that board. A seventy billion parameter model has seventy billion sliders. Each one holds a number, typically a decimal with many digits, and that number was found during training by the process we cover in episode three. More sliders means the board can potentially represent finer distinctions. A model with more parameters can hold more patterns, capture subtler relationships between words, remember more obscure facts from its training data.

But here is what the number does not tell you. It does not tell you how well those sliders were set. It does not tell you what data was used to set them. It does not tell you whether the training process ran long enough, or whether it was optimized for the kind of task you care about. A seventy billion parameter model trained on low quality data with a rushed schedule will produce worse results than a well trained seven billion parameter model that spent months on carefully curated text. The sliders are there, but they are set to noise instead of music.

Think of it this way. You could build a mixing board with ten thousand channels. But if the person setting the levels has never heard the song, has never been in the studio, and was given fifteen minutes instead of fifteen hours, that giant board will sound worse than a sixteen channel board mixed by someone who knows exactly what they are doing.

When Small Beats Big

This is not hypothetical. It happens constantly, and it is one of the most practically useful things to understand about AI right now.

A seven billion parameter model that has been fine tuned for medical question answering will outperform a four hundred billion parameter general model on medical questions. Not by a little. By a lot. The fine tuned model has fewer sliders, but every slider has been precisely adjusted for that specific kind of music. The general model has vastly more sliders, but they are set for everything and therefore optimized for nothing in particular.

This pattern repeats across every domain. Code generation, legal analysis, translation between specific language pairs, customer support for a particular product. The specialist model wins on the specialist task. The generalist model wins on breadth and flexibility, on the ability to handle a question it has never seen in a domain it was not specifically tuned for. Neither is better. They are different tools.

The analogy from the main episode extends naturally here. A recording studio has a main mixing board with hundreds of channels for the full mix. But each musician's monitor feed goes through a smaller, separate board, because the drummer needs a completely different mix than the vocalist. Nobody would claim the monitor board is inferior because it has fewer channels. It is doing a different job, and for that job, fewer channels set precisely is better than more channels set loosely.

The Practical Mixing Board

So how do you actually choose? Here is a framework that works right now, as of early twenty twenty six, and will likely hold for a while because it is based on the architecture, not the marketing cycle.

Ask three questions about your task.

First, how much does the task benefit from broad knowledge? If you are writing an essay that draws on history, philosophy, science, and pop culture in the same paragraph, you need a model that has seen a wide range of text during training. That usually means a larger model. The breadth of knowledge roughly scales with both parameter count and training data size.

Second, how much does the task benefit from precision in a specific domain? If you are generating code in one programming language, summarizing legal contracts, or translating between two specific languages, a smaller model fine tuned for that domain will usually be faster, cheaper, and more accurate. The precision of domain specific knowledge scales with the quality and focus of training, not with raw size.

Third, how much do you care about speed and cost? This is where the parameter count has a direct, unavoidable effect. A seventy billion parameter model requires roughly ten times the memory and computation of a seven billion parameter model. If you are running a hundred thousand queries a day, that difference is the difference between a reasonable server bill and an unreasonable one. If you are asking one question a week, it does not matter at all.

The Confidence Trap

Here is something the parameter count actively hides, and it connects directly to the "why understanding vanishes" section from episode two.

Larger models are not just more knowledgeable. They are more confident. More parameters means more capacity to generate fluent, convincing, detailed responses, even when those responses are wrong. A small model asked about something outside its training data will often produce visibly awkward output, halting sentences, obvious gaps, hedging language. A large model asked the same question will produce a smooth, authoritative paragraph that reads like it came from an expert. The wrongness is harder to detect precisely because the model is better at sounding right.

This is not a flaw in larger models. It is a direct consequence of the mechanism. More knobs means a richer space of possible outputs, and "confidently wrong" lives in that space alongside "confidently right." The model does not know which one it has produced. It has no internal experience of certainty or doubt. It produced the output that its billions of trained sliders generated for your specific input. Whether that output is brilliant or nonsense depends on whether your question falls in a region of the training data that was well represented.

The practical takeaway: never trust a model more because it is bigger. Verify claims that matter, regardless of which model produced them. The mixing board metaphor helps here too. A bigger board can produce a richer sound, but it can also produce richer distortion. The distortion just sounds more professional.

Try This

Here is something you can do today that will teach you more about model selection than any article.

Pick a task you do regularly with AI. Summarizing a document, drafting an email, generating a function in a programming language you know well, explaining a concept to a colleague. Now run that exact same task through three different models. Use a large frontier model, a mid size model, and a small focused model. Keep the prompt identical across all three.

Read the three outputs side by side. Not which one sounds best, that is the confidence trap. Read for which one is most correct, which one finishes fastest, which one catches the details your task actually requires. You will almost certainly discover that the biggest model is not the best for your specific task. You might discover that the smallest one is. You will definitely discover that the relationship between size and quality is not a straight line. It is a curve with bumps and valleys, and where your task sits on that curve is something only testing can reveal.

Do this for three or four of your regular tasks and you will have a personal map of which model to reach for when. That map is worth more than any benchmark, because benchmarks measure average performance across thousands of tasks, and you do not have thousands of tasks. You have yours.

The Real Skill

Understanding neural networks at the level we covered in episode two and the deep dive gives you one critical advantage as a user: you stop being impressed by the wrong things. Parameter counts, benchmark scores, press release superlatives, these are the metrics companies want you to use because they are easy to market. The metrics that actually matter for your work are harder to measure and impossible to put in a headline. How well does this model handle my specific kind of question? How fast does it respond at the volume I need? How often does it produce something I have to fix versus something I can use?

The mixing board is not the music. The knobs are not the song. Seventy billion parameters is a description of the instrument. What matters is whether the instrument is tuned for what you need to play.

That was the practical companion for episode two.