One Point Five Seconds: A Disaster in Slow Motion

The Distress Call

The call came in on a Sunday evening in March. The message was five words long.

"The capture UI is insanely slow right now. Investigate."

That is the kind of message that lands differently depending on what you know about the system behind it. If you know that PärKit is a personal life OS, a household tool used by three people in northern Sweden to manage tasks, calendars, and a shared collaboration space, then "insanely slow" sounds like a mild annoyance. Maybe restart the server. Maybe clear a cache. Maybe wait fifteen minutes and it will fix itself.

If you know what was actually happening inside that server, the message reads more like a cockpit voice recorder. Because by the time anyone noticed, the system had been in a state of catastrophic performance failure for five days. Every single API request, every health check, every dashboard load, every task creation, every calendar query, had been dragging a one thousand five hundred millisecond anchor behind it. Not sometimes. Not under load. Every request. Every time. For five days.

And nobody noticed.

The Infrastructure

To understand how a household tool can have a meltdown worth talking about, you need to understand the infrastructure. And to understand the infrastructure, you need to understand that the word "infrastructure" is doing a lot of heavy lifting here.

PärKit runs on two machines. The first is a virtual private server hosted by Scaleway in Paris, a French cloud provider. The server has a nickname. Its nickname is Popcorn. Popcorn runs five tools that together form a personal life operating system. Capture handles the inbox, the place where tasks and ideas land before they get sorted. Focus handles execution, the system that tracks what is actually getting done. Time handles the calendar. Hubben is a collaboration tool, built so that Pär and his sister SJ can work together on shared projects. And Apilog tracks how much money all the AI experiments are costing, because when you run thirty one experiments on various cloud AI platforms, someone needs to watch the bill.

The second machine is a Raspberry Pi five. A single-board computer the size of a deck of cards, sitting on a shelf somewhere in a house in northern Sweden. Its nickname is Pinkserver. It handles SMS alerts and home automation. It also runs its own copy of the PärKit authentication system.

All five tools share one PostgreSQL database. They share a common authentication module. They share a deployment pipeline. And as of March tenth, twenty twenty six, they share a problem.

The Security Principle

The story begins on February twenty eighth with a reasonable idea. Pär wants to build Hubben, the collaboration tool, so he and his sister can work on projects together. The existing PärKit authentication is simple. One shared token for everyone. Fast, cheap, works fine when you do not care which user is making the request. But a collaboration tool needs to know who is who. You cannot have a shared workspace where everyone looks like the same person.

So a new authentication system is designed. Claude Opus, the AI that architects most of PärKit, makes the design decisions. And the first decision is a security principle that sounds unimpeachable.

"Tokens from day one."

This means that authentication tokens, the secret strings that prove you are who you say you are, will be hashed before they are stored in the database. Not stored as plain text. Hashed. If someone breaks into the database, they find a pile of cryptographic noise, not usable tokens. This is good practice. This is what security-conscious systems do. Nobody would argue with this.

The question is not whether to hash the tokens. The question is how. And here, in the gap between "what" and "how," the disaster begins to assemble itself.

The Algorithm

The algorithm chosen for token hashing is called bcrypt. If you have ever created an account on a website, bcrypt has probably protected your password. It is a venerable, well-respected, deliberately slow hashing algorithm. And that last part, the deliberately slow part, is the entire point of its existence.

Here is why bcrypt is slow on purpose. When a human creates a password, they create something short, memorable, and often predictable. "Fluffy two thousand twenty six." "Password one two three." The kind of thing an attacker can guess by trying millions of combinations per second. Bcrypt defends against this by making each guess expensive. Each bcrypt check takes approximately three hundred milliseconds. That is three tenths of a second. For a single check, you do not notice. For an attacker trying a million guesses, that is three hundred thousand seconds. Nearly three and a half days to try a million passwords. That is the point. Bcrypt buys time.

But PärKit does not use passwords. PärKit uses tokens. Computer-generated random strings, three hundred and eighty four bits of pure entropy. The kind of string that would take longer than the heat death of the universe to brute force regardless of how fast your hashing algorithm is. Using bcrypt to protect a high-entropy token is like hiring a team of armed guards to protect a vault that is already orbiting Neptune. The guards are not wrong. They are just solving a problem that does not exist.

Nobody raises this distinction. Not the architect. Not the peer reviewer. The spec for Hubben, written on March tenth, documents the choice with admirable clarity. Line one hundred and seventy two reads: "Each user gets their own token. Tokens stored in the hubben users table, hashed with bcrypt."

And then, a few lines later, the sentence that will haunt the entire system for five days. The spec says, and this is a direct quote from the design document: "bcrypt is a salted one-way hash. You cannot match tokens via SQL WHERE. The auth code must load all active users and compare each."

Read that sentence again. Let it sink in. The design document explicitly describes what will become the catastrophic failure mode. It says, in writing, that the system cannot do a fast database lookup. It says the code must load every user and check each one individually. It documents the bomb and files it under "architecture decisions."

The Peer Review

A peer review is conducted. Same day, March tenth. The reviewer, another Claude session, reads the spec and flags issues. Issue number two is titled "Will cause bugs at runtime." The review correctly identifies the bcrypt lookup pattern as a concern.

The reviewer's assessment: "With three users this is negligible."

This is technically correct. Three users times three hundred milliseconds per bcrypt check equals nine hundred milliseconds. Under a second. Not great, but for a household tool with three users, it is the kind of latency that gets attributed to network conditions. A shrug. A "probably fine." The reviewer marks it as acknowledged, recommends documenting the pattern, and moves on.

What the reviewer does not do is ask one question. The question that, in every disaster movie, is the question nobody asks until the third act.

"What happens if this pattern spreads?"

The Generalization

The same day. March tenth. Phase seven of an internal project called Operation ShapeUp. The goal of Phase seven is to take Hubben's authentication system and lift it into a shared layer that all PärKit tools can use. A single authentication module for Capture, Focus, Time, Hubben, and Apilog. One codebase, one pattern, one set of rules for everyone.

This is good engineering. Shared code reduces duplication. A fix in one place fixes all five tools. A security improvement propagates everywhere. There is no world in which consolidating authentication into a shared module is a bad idea.

Unless the pattern you are consolidating is a time bomb.

The bcrypt loop, designed for Hubben's three users, is now the authentication path for every request to every tool. But it is not just the loop that gets generalized. It is the ordering. The shared authentication module checks tokens in this order.

Step one. Load all active users from the parkit users table. For each user, run a bcrypt comparison against the incoming token. Three hundred milliseconds per user. Five users in the table. One thousand five hundred milliseconds. One and a half seconds.

Step two. If no bcrypt match was found, try the legacy token. A simple hash comparison. Microseconds.

Step three. If that fails too, try the guest token. Also microseconds.

The ordering is deliberate. The new parkit users system is "the future." The legacy token is "the past." You check the future first because eventually, all tokens will be in the new system. This is designing for tomorrow at the expense of today. And today, right now, in March twenty twenty six, ninety five percent of all requests use the legacy token.

Ninety five percent of requests. The legacy token. The one that is checked last. The one that costs microseconds. But to reach those microseconds, every single request must first endure one thousand five hundred milliseconds of futile bcrypt comparisons. Five users, loaded from the database, each checked against a token that will never match any of them, because the legacy token is not a bcrypt hash. It is a plain text string. The bcrypt loop will fail, every single time, on every single user, and it will take one and a half seconds to do it.

The Cache That Could Not

There is a cache. Of course there is a cache. Someone thought about performance. The cache stores the result of token lookups, keyed on the first sixteen characters of the token. If you have checked a token before and it matched a user, the next time you see that token, you skip the bcrypt loop entirely. Instant.

This sounds like it should save the system. It does not. And the reason it does not is a masterclass in how the obvious solution can be perfectly designed and perfectly useless at the same time.

The cache only stores positive results. When a token matches a user, the match is cached. When a token does not match any user, nothing is cached. The miss is forgotten. The next request with the same unmatched token pays the full penalty again.

The legacy token, the token used by ninety five percent of all requests, never matches any bcrypt hash. It cannot. It is the wrong type of credential entirely. So it is never cached. The most common token in the system, the token that hits the authentication module dozens of times per hour, falls through the cache every single time and lands in the one thousand five hundred millisecond bcrypt loop. Every time. Without exception. The cache is a net installed six feet to the left of where people are falling.

The Multiplication

It gets worse. There is a middleware component called ReadOnlyGuardMiddleware. Its job is to check whether a token belongs to a read-only user before allowing write operations. To do this, it calls the user lookup function.

The same user lookup function.

The one with the bcrypt loop.

So for any write request, creating a task, adding a note, updating a calendar entry, the authentication penalty is paid twice. The request hits the main auth check. One thousand five hundred milliseconds. Then it hits the ReadOnlyGuardMiddleware. One thousand five hundred milliseconds again. Three full seconds of bcrypt computation to add a single item to a to-do list.

A to-do list. For a family of three.

SJ's Special Circle of Latency

And then there is SJ. Pär's sister. The person Hubben was built for. The reason multi-user authentication exists in the first place.

SJ's token does not live in the parkit users table. It lives in the hubben users table, because she was created as a Hubben user, not a PärKit user. The system has not yet consolidated these tables. So when SJ makes a request, here is what happens.

Step one. The system loads all five users from the parkit users table and runs bcrypt against each one. None of them match SJ's token, because SJ is not in this table. One thousand five hundred milliseconds. Wasted.

Step two. The system falls through to the Hubben-specific authentication. It loads all three users from the hubben users table and runs bcrypt against each one. SJ's token matches on user two or three. Nine hundred milliseconds.

Total authentication time for SJ: two thousand four hundred milliseconds. Two point four seconds. Per request. Just to prove she is who she says she is.

The person the system was designed to serve is the person it punishes the most.

Five Days of Silence

March tenth through March fifteenth. Five days. The system is live. Every request is dragging a one and a half second anchor. The dashboard, which makes multiple API calls to render, takes eight point four seconds to load. A health check endpoint, an endpoint whose entire job is to run SELECT one against the database and return the word "ok," takes one thousand four hundred and ninety milliseconds.

Nobody notices.

This is not because nobody uses the system. People use it every day. This is a household tool for managing daily life. Tasks are created. Calendar entries are checked. The collaboration space is visited. But the slowness is constant, not spiking. It does not crash. It does not error. It does not send alerts. It simply takes one and a half seconds longer than it should, every single time, like a heartbeat that has slowed but not stopped.

People blame the network. People blame the VPS. People blame the browser. The system is running on a server in Paris, accessed from northern Sweden. A little latency is expected, right? The internet is not instant. Things take time. The dashboard has always been kind of slow, has it not?

It has not. But nobody measured before, so nobody can prove it now.

The Moment of Discovery

March fifteenth, evening. Pär is running stability testing on the Capture UI after completing the first phase of Operation ShapeUp. This is the first time anyone has looked at the system with a stopwatch instead of a vague sense that things feel sluggish.

The numbers come back. They are not sluggish. They are catastrophic.

The dashboard takes eight point four seconds to load. The stats endpoint alone accounts for eight point four seconds, because it makes multiple authenticated API calls, each one paying the bcrypt tax.

The first measurement is api slash me. This is the simplest possible endpoint. It receives a request. It checks authentication. It runs SELECT one against the database, a query so trivial it exists solely to prove the database is alive. It returns the result. Nothing else. No computation. No data processing. No business logic. Just "are you there?" and "yes."

One thousand four hundred and ninety milliseconds.

One and a half seconds to answer the question "are you there?" The server is right there. It is sitting in a data center in Paris, connected to a gigabit backbone, running a modern PostgreSQL instance on solid state storage. The database query takes less than a millisecond. The network round trip takes maybe fifty milliseconds. And somehow, the response takes one thousand four hundred and ninety.

Where are the other one thousand four hundred and thirty nine milliseconds going?

The Diagnosis

The initial theories are generous. Maybe it is the database. Maybe the new ShapeUp code introduced a slow query. Maybe the Capture queries need optimization. Maybe there is a connection pool issue. These are the things you check first because they are the things that usually cause slowness.

None of them explain a one thousand four hundred and ninety millisecond health check. The health check does not use the new code. It does not run complex queries. It does not touch the connection pool in any meaningful way. The health check should take single-digit milliseconds. It is taking one and a half seconds. Something is happening before the endpoint logic even runs.

The investigation moves to the authentication layer. And here, the timeline of the disaster becomes visible for the first time. Like pulling up the flight recorder data after a crash, the sequence of events is obvious in retrospect. The function called underscore lookup underscore user, living in parkit common dot auth, is the first thing that runs on every request. Before any endpoint logic. Before any database query. Before anything useful happens at all.

The function loads all active users from the parkit users table. Currently five users. For each user, it runs bcrypt dot checkpw, comparing the incoming token against the stored hash. Each comparison takes approximately three hundred milliseconds. Five users times three hundred milliseconds equals one thousand five hundred milliseconds.

There it is. The missing one thousand four hundred and thirty nine milliseconds. It was never the database. It was never the network. It was never the code. It was the lock on the door.

The Math

The arithmetic of this disaster fits on a napkin.

Five users in the database. Each bcrypt check costs three hundred milliseconds. The system checks all five before giving up. Five times three hundred equals one thousand five hundred. That is the floor. No request can complete faster than one thousand five hundred milliseconds, because no request is allowed to do anything until the authentication module has spent one and a half seconds checking tokens that will not match.

At ten users, it would be three seconds. At twenty users, six seconds. At fifty users, fifteen seconds. The system scales linearly into unusability. But it does not need to scale. Five users is already a disaster. This is a system where five users, five entries in a database table, five members of a single family, create enough cryptographic overhead to make a health check take longer than a transcontinental video call.

And the truly elegant part, the part that makes this a genuine architectural comedy, is that the expensive check always fails. The legacy token is not a bcrypt hash. It will never match a bcrypt hash. The system knows this. The code knows this. The design documents know this. But the code checks anyway. All five users. Every time. Three hundred milliseconds each. One and a half seconds of pure, crystallized futility.

The Fix

The fix is three lines of code.

Line one. Move the legacy token check before the bcrypt loop. Check the cheap thing first. If the legacy token matches, return immediately. Do not enter the bcrypt loop. Do not load the users. Do not spend three hundred milliseconds per user checking something that will never match. Just check the fast thing first.

Line two. Cache negative results. When the bcrypt loop checks all five users and finds no match, cache that result. "This token does not match any user." The next time the same token arrives, skip the loop. Return the cached negative immediately.

Line three is the semicolon at the end of line two.

Before the fix. Api slash me: one thousand four hundred and ninety milliseconds. After the fix. Api slash me: three milliseconds. The dashboard drops from eight point four seconds to two hundred milliseconds.

The stats endpoint. Before: one thousand four hundred and sixty milliseconds. After: twenty four milliseconds. The ideas endpoint. Before: one thousand four hundred and seventy milliseconds. After: forty two milliseconds. The momentum endpoint. Before: one thousand four hundred and seventy milliseconds. After: thirty seven milliseconds.

Four hundred and ninety seven times faster. On the simplest endpoint. Not by rewriting the system. Not by changing the database. Not by upgrading the server. By moving one check above another check. By putting the fast thing before the slow thing. By doing what should have been done on March tenth, before five days of silence, before eight point four second dashboards, before SJ's two point four second authentication odyssey.

The Five Reasonable Decisions

This is where a disaster movie would roll credits and a postmortem would assign blame. But this disaster does not have a villain. It has five reasonable decisions that each made perfect sense in isolation and collapsed catastrophically in combination.

Decision one. Hubben needs multi-user authentication. Reasonable. You cannot build a collaboration tool where everyone is the same person.

Decision two. Hash tokens with bcrypt. Reasonable. "Tokens from day one" is a security principle that sounds like something you would frame and hang on the wall of a security operations center.

Decision three. Use an O of n lookup pattern, loading all users and checking each one. Reasonable, given the peer review's assessment. "With three users this is negligible." True. Three users, nine hundred milliseconds, barely noticeable.

Decision four. Generalize this pattern to all PärKit tools. Reasonable. Shared authentication code reduces duplication, improves maintainability, and ensures consistent security across the entire platform.

Decision five. Check the new bcrypt system before the legacy system. Reasonable. The new system is the future. Design for the future, not the past.

Five decisions. Each one defensible. Each one made by a competent architect operating with incomplete information. And together, a perfect storm. A cascade where "reasonable for three users" became "catastrophic for all users," where "design for the future" became "ignore the present," and where "security best practice" became "three hundred milliseconds of wasted computation per user per request per endpoint, forever."

The investigation report calls it "five cascading architectural assumptions that compounded." That is the polite, professional phrasing. The disaster movie phrasing is simpler. Five people passed a package to each other, and nobody opened it to check if it was ticking.

The Principle Underneath

There is a lesson underneath this wreckage that extends far beyond a household tool on a French cloud server.

The lesson is not "do not use bcrypt." Bcrypt is excellent at what it does. It protects passwords against brute force attacks, and it has done so reliably for decades. The lesson is not even "check the fast thing first," although that is a good rule and you should follow it and you should write it on a sticky note and put it on your monitor.

The lesson is about what happens when a pattern that works at one scale gets promoted to a different scale without re-examination. The bcrypt loop was fine for Hubben. Three users, nine hundred milliseconds, household tool, nobody cares. The moment it became the authentication path for every request to every tool, someone needed to sit down with a calculator and do the multiplication. Five users times three hundred milliseconds. Write it on the whiteboard. Stare at it. Ask, "Is one and a half seconds per request acceptable?"

Nobody did the multiplication. Not the architect. Not the peer reviewer. Not the deployment process. The number was never written down. The cost model was implicit, buried in the assumption that "bcrypt is slow but necessary," never made explicit as "bcrypt will cost us exactly this many milliseconds on every request."

Cost models are load-bearing. That is the phrase from the postmortem, and it deserves to survive beyond it. When you know the cost of an operation and you know how many times it runs, you can predict the total. When you do not, you cannot. And when the operation runs on every request, the cost is not a line item. It is the floor. Nothing can be faster than the slowest thing that runs every time.

The Absurd Scale

Step back for a moment. Look at what actually happened here. A Raspberry Pi five and a French cloud server running a household tool for three family members had the authentication latency of a government mainframe. Five users, not five million. A PostgreSQL database with fewer rows than a grocery list. An API that serves maybe a few hundred requests per day, not per second. And every single one of those requests was slower than a video call to the other side of the planet.

The system was, by any measure, the most overprotected household to-do list in Scandinavia. Bcrypt was guarding tokens that had more entropy than the entire Swedish language. An O of n loop was iterating over a user table that you could fit on a Post-it note. A cache was carefully storing positive results for a scenario that never occurred while ignoring the scenario that occurred ninety five percent of the time.

And the most absurd part is that the security was genuine. The tokens were properly hashed. The authentication was working correctly. Every request was authenticated accurately and completely. The system was secure. It was just also running at approximately one five hundredth of its potential speed, which is the kind of performance characteristic that makes you wonder whether the server is running a cryptographic authentication system or mining Bitcoin on the side.

The Postscript

The interim fix, the three lines, deployed on March sixteenth. All five PärKit tools restarted. The dashboard loads in two hundred milliseconds. SJ's authentication drops from two point four seconds to single-digit milliseconds. The health check responds in three milliseconds. The system, for the first time in five days, runs at the speed it was always supposed to run at.

A permanent fix is planned. Migration zero four nine will add a SHA two fifty six column to the user table. SHA two fifty six is a fast hash. Microseconds, not three hundred milliseconds. It can be indexed, which means the database can look up a token directly instead of loading every user and comparing each one. The O of n bcrypt loop will become an O of one indexed lookup. The authentication system will go from one thousand five hundred milliseconds to approximately one millisecond. From five hundred times too slow to faster than anyone will ever be able to measure with a stopwatch.

But that is a future project. Today, the three-line fix holds. The legacy check runs first. The cache catches negative results. The system is fast enough. And somewhere in a data center in Paris, a server nicknamed Popcorn is authenticating requests in three milliseconds instead of one thousand five hundred, blissfully unaware that it spent five days of its life performing one and a half seconds of pointless cryptography on every single thing anyone asked it to do.

Five reasonable decisions. Five days of silence. One and a half seconds on every request. Three lines to fix it. Five hundred times faster.

The math always fit on a napkin. Nobody did the math.