Turning a Face Into Five Hundred Numbers

It Finds You Across Twenty Years

In your photo archive you can point at one face and the system pulls every other picture of that same person, across years, through changes of hair and weight and light, in profile, half in shadow, looking away from the camera. It does this without ever being told who the person is. It just knows that these faces belong together and those do not. That ability feels almost spooky, and it rests on one idea that turns out to govern a huge swath of modern artificial intelligence. The system does not really store faces at all. It turns each face into a list of numbers, a position, and then the whole problem of recognition becomes a problem of measuring distance. Once you see that move, you see it everywhere.

A Place Instead of a Picture

Picture a vast space, not the three dimensions we live in but hundreds of dimensions, far more than anyone can imagine. A neural network trained on millions of faces learns to take any face you hand it and place it as a single point somewhere in that space. The list of numbers it produces, maybe five hundred of them, is just the coordinates of that point. That list is called an embedding, because the face has been embedded, set into, this enormous space. The picture itself is thrown away. What remains is a location.

And the entire skill of the network, the thing it was trained for, is to arrange that space so it means something. Two photos of the same person, even taken decades apart, should land in almost the same spot. Two photos of different people should land far apart. That is the whole goal. Get the space arranged that way and recognition becomes laughably simple. To ask whether two faces are the same person, you do not compare the pictures. You measure how far apart their two points sit. Close together, same person. Far apart, different people. The hard, fuzzy, human act of recognizing a face has been turned into the cold act of measuring a distance between two points.

Taught by Comparison

How do you teach a network to place faces this cleverly. You show it faces in groups and you nudge it with a simple rule, over and over, millions of times. Here are two photos of the same person and one photo of someone else. Move the two matching faces closer together in the space, and shove the stranger further away. Closer, further. Closer, further. Across a mountain of examples, the network slowly bends the space into a shape where sameness means nearness. It is never told the rule for what makes a face that face. It discovers, on its own, which subtle features stay constant across lighting and age and angle, because those are the only features that let it satisfy the closer-and-further demand reliably.

That is why it holds up across such hard variation. It learned, the slow way, that hairstyle and lighting and expression are noise that moves a face around the space a little, while the deep structure of someone's features is the signal that pins them to their spot. But it also means the system is only as fair as the faces it was shown. If the training crowd was lopsided, heavy on some kinds of faces and thin on others, the space is carved more finely where the data was rich and more crudely where it was sparse. The network recognizes the well-represented faces with ease and stumbles on the underrepresented ones, not from any malice, but because nobody ever taught it to spread those faces apart properly. The bias lives in the geometry, baked in by who was in the photos.

The Same Choice You Already Faced

Now here is the thread back to something you already know. Recognition comes down to a single decision. How close is close enough to call it the same person. You draw a line, a threshold distance. Inside the line, declare a match. Outside it, declare a stranger. And that line is the exact same choice you met when deciding whether two company records were one company. Set the line too loose, too generous about calling things the same, and you start matching different people to each other, a false identification. Set it too strict and you miss real matches, failing to connect two genuine photos of the same person. You cannot have zero of both. Every threshold is a trade between mistaking strangers for each other and failing to reunite the same soul across two pictures. Which error you fear more depends entirely on what the archive is for.

The Keeper

So the picture to keep is this, and it reaches far past faces. The system does not remember faces. It converts each one into a position in a huge invisible space, a list of a few hundred numbers, and it was trained by being told, endlessly, to pull the same person together and push different people apart, until nearness in that space simply means sameness in life. Recognition is then just measuring distance, and the only knob is how near counts as a match, which is the same which-mistake-can-I-live-with choice you made merging companies in the mining work. Identity, whether of a person or a corporation, keeps turning into geometry, a point and a distance and a line you have to draw yourself. The face was never stored. Only the place it landed.