The Strategy That Died When You Hid the Finish Line

First Place to Eighth, From One Change

You ran a tournament of artificial minds playing a betrayal game against each other, twelve different versions of the same model family, thousands of rounds. One of them won cleanly. It read its opponents, matched cooperation with cooperation, and then, in the final round, it turned and stabbed every one of them in the back for a last cheap point. It was the champion. Then you changed exactly one thing about the game. You stopped telling the players how many rounds were left. And the champion collapsed. It fell from first place to eighth. Its whole winning identity evaporated, and the reason why is one of the cleanest little windows into whether these models are actually reasoning or merely parroting that you will ever get.

The Oldest Trap in Game Theory

The game underneath is the famous one. Two players, each choosing in secret to cooperate or to betray. If both cooperate, both do modestly well. If both betray, both do badly. But if one betrays while the other cooperates, the betrayer wins big and the cooperator takes the worst outcome of all. The cruel logic is that no matter what the other player does, betraying always scores you more in that single round. So two perfectly selfish players both betray, and both end up worse off than if they had simply trusted each other. It is the sharpest illustration in all of social science of how individual cleverness can produce a collectively stupid result.

The twist that makes it a real study is playing it over and over against the same opponent. Now reputation matters. Betray someone today and they will punish you tomorrow. Decades ago a scholar named Axelrod ran tournaments of human-written strategies for exactly this, and the winner was almost embarrassingly nice. It cooperated by default, retaliated once when crossed, then forgave. Your tournament was that same idea, but the players were not human-written rules. They were language models, each left to work out its own approach, and their personalities came out wildly different.

The Cast

One of them, a small one, was a pure psychopath. Across two hundred and ten rounds it cooperated zero times. Not once. It betrayed every opponent, every round, regardless of consequence, a flat unbroken line of treachery. Another, a model built for writing code, was the opposite, a saint that could not be taught a lesson. It kept cooperating even while being mercilessly exploited, losing one game by a score of forty-six to one and coming back the next round to offer its hand again. It simply did not seem to care about the structure of the game at all, which, as you will see, made it the perfect control.

And then there was the winner. The middle model played the smart human strategy almost exactly. It cooperated with cooperators, punished betrayers, built trust, and rode that trust to a high score. But it had one extra move. It knew when the last round was coming, and on that final round, when there could be no retaliation because the game was over, it betrayed everyone. Free points, no consequences. That endgame stab was the margin of its victory. It was not the nicest player. It was the nice player who knew exactly when niceness stopped paying.

Hide the Clock and the Knife Has Nowhere to Land

Here is the change you made. In the new tournament, the players were no longer told how many rounds remained. The end could come at any moment. And watch what happened to the champion. Its endgame betrayals, which had been thirty-nine out of fifty-five chances in the old game, fell to four out of fifty-five. It almost completely stopped stabbing. Because the entire logic of the final-round betrayal is, there is no tomorrow, so punishment cannot reach me. Remove the player's knowledge of when the end is, and every round might have a tomorrow, so betraying is always dangerous again. The knife had nowhere safe to land, so the model put it away.

Strip away its one winning move and the champion was just another cooperator, indistinguishable from the crowd. It fell to eighth. Meanwhile, models that had been timid in the old game climbed five places each, because hiding the endpoint turned them from nervous grudge-holders into confident players. The whole field bunched together. The spread between best and worst score collapsed from two hundred and sixty-three points down to ninety-five. And a statistical test put the odds of this being a fluke at less than one in a thousand. Everyone converged on cooperation, because without a visible finish line, cooperation is simply the only sane long-term bet.

The Keeper

The reason this matters is the question hiding inside it. Were these models actually reasoning about the structure of the game, or just imitating patterns of text about cooperation. And the sharpness of the collapse is the answer. If the champion had merely been parroting, the betrayals would have drifted down a little, fuzzily. Instead they fell off a cliff, thirty-nine to four, precisely tracking the disappearance of the one piece of information the betrayal depended on, the location of the end. That is not imitation. That is a model genuinely computing, there is no longer a safe last round, therefore the stab no longer pays. The cooperation was never kindness. It was arithmetic about the final move. And the moment you hid the final move, the arithmetic changed, and so did the behavior, instantly and exactly as a reasoning player would. Your control model, the saint that never cared about structure, behaved identically in both games, which is what proved the whole thing was real.