Kronos and the Robot Traders

A Trading Firm Made of Chatbots

Somewhere on a benchmark server right now, a language model is losing money in a real financial market, in public, where everyone can watch. That is not a hypothetical. Over the past year a small wave of projects has stopped asking whether a chatbot can talk about investing and started asking something far more dangerous. Can it actually trade. And the way they built it is wonderful, because they did not build a single genius. They built an office.

The first of these frameworks took a large language model and split it into a whole trading firm, with different roles arguing with each other. One agent plays the bullish analyst, hunting for reasons to buy. Another plays the bear, hunting for reasons it is all going to collapse. There are researchers who dig into the fundamentals, a trader who actually pulls the trigger, and a risk manager whose entire job is to be the nervous person in the room saying maybe not this much. It is a committee. The model is talking to itself in different hats and trading on the outcome of the argument.

The reported results were good enough to be suspicious in the charming way. Better cumulative returns, a better risk-adjusted ratio than simpler approaches. And the moment you hear a backtest sounded great, every experienced person in the room should narrow their eyes, because a backtest is a model proving it would have been brilliant in a past that already happened. Which is exactly why the more honest projects did the harder thing.

The Live Benchmark Problem

The harder thing is to let the machine trade forward, into a future nobody has seen yet. One of the newer projects built what it calls a fully automated live benchmark. Not a replay of old data. A real, running scoreboard where language models process incoming information on their own and make decisions across several markets, and you watch how they do going forward, in conditions that have not been cleaned up by hindsight.

This distinction is the whole game, and it is worth feeling in your gut. In a backtest, the model is taking an exam where the answer key has already been written, and it is very easy to accidentally let the answers leak into the questions. Tiny mistakes in how you set it up, and the model looks like a prophet. A live benchmark removes the answer key entirely. The market has not decided yet. The model is genuinely guessing, the same as the rest of us, and now its confidence costs real points. Most strategies that look like magic on historical data turn into very expensive coin flips the moment the future is actually unknown.

Teaching a Model to Read Candlesticks

Then there is the one I find quietly beautiful, a model called Kronos, named for the Greek titan of time. Most of these trading projects take a general language model and teach it about markets. Kronos went the other direction. It is a foundation model built from the ground up on the native language of markets, the candlestick. If you have ever looked at a price chart, you have seen them, those little vertical bars with wicks that show where a price opened, closed, and how far it swung in between, over some slice of time.

Kronos treats those bars the way a language model treats words. It learned a way to chop continuous price movement into discrete tokens, little units it can predict one after another, and then it trained itself to guess the next piece of the chart the way a text model guesses the next word in a sentence. A grammar of greed and fear, learned from the shape of the candles. The claim is that this specialized foundation beats the general-purpose approaches at forecasting and even at generating realistic fake market data, which is its own slightly unsettling trick, a machine that can dream up price history convincing enough to fool the tools that analyze it.

When They Play For Money

Here is where it connects to something you may have thought about in a different setting entirely. The behavior of these models under pressure. There is a long line of research where you put language models into the prisoner's dilemma, the classic game where two players each choose to cooperate or betray, and betrayal pays off in the short term but destroys you both if everyone does it. Different models reveal genuinely different personalities in that game. Some are reliable cooperators. Some are eager defectors. Some are sophisticated, cooperating right up until they sense the end is near and then turning on you at the last possible move.

Now move that same question into a market, where the betrayal is not a game token but money, and the other players are sometimes humans and sometimes other models. The trading committee that argues with itself, the live benchmark grinding forward into the unknown, the candlestick model dreaming up futures. Underneath all of it is the same unresolved question. When a language model is given something real to lose, does it behave wisely, or does it just behave confidently. Those are not the same thing, and the market is the one judge that does not care which one you are.

The reason this is worth watching, even if you would never hand one of these a single krona, is that markets are the most ruthless feedback machine humans have ever built. You can argue forever about whether a model is reasoning or pattern-matching, whether it understands or imitates. A market does not argue. It just pays you or takes your money, every day, in public, with no partial credit. And so this strange little corner of research, chatbots in fake hedge funds, accidentally became one of the most honest tests we have. Not because trading is important. Because the market is the one exam these models cannot talk their way out of.