ACID Trip: The Guarantees Under Your Data

The Last Voyage

On the morning of January twenty-eighth, two thousand seven, a sixty-three-year-old man motored his sailboat out of San Francisco Bay. The boat was a forty-foot red-hulled sloop called Tenacious. The man was alone. His destination was the Farallon Islands, a cluster of jagged rocks twenty-seven miles past the Golden Gate Bridge. He was going to scatter his mother's ashes at sea.

He called his wife and daughter from his cell phone that morning. The weather was clear. The seas were calm. [serious] He never came back.

His name was Jim Gray. And if you have ever written a row to a database and trusted that it would still be there tomorrow, you are relying on ideas that he spent his life building. [sad] The man who defined what it means for a database to keep its promises sailed into the Pacific Ocean and vanished without a trace. No distress call. No signal from the emergency beacon. No wreckage. <break time="1s"/> Nothing.

This is episode seven of The Vibecoder's Guide to Postgres. Last time we talked about migrations, the terrifying art of changing a table while people are using it. Today we are going deeper. Past the schema, past the queries, into the guarantees that make a database a database. This is the ACID trip. And it starts with a question that sounds simple but is not. What is a transaction?

What Is a Transaction

Imagine you are transferring money. A hundred dollars from your checking account to your savings account. This requires two operations. Subtract a hundred from checking. Add a hundred to savings. Now imagine the power goes out between those two operations. After the subtract, before the add. A hundred dollars just disappeared from the universe. Your checking account is a hundred dollars lighter. Your savings account never got the deposit. The bank's books are wrong, and nobody can figure out where the money went.

This is the problem that Jim Gray formalized in nineteen eighty-one. He was working at Tandem Computers by then, after a decade at IBM, and he wrote a paper called "The Transaction Concept: Virtues and Limitations." In it, he gave the world a definition.

A transaction is a transformation of state which has the properties of atomicity, all or nothing, durability, effects survive failures, and consistency, a correct transformation.

All or nothing. That is the core of it. The two operations, subtract and add, are not two separate actions. They are one indivisible unit. Either both happen, or neither happens. If the power goes out, if the disk fails, if the software crashes, the database rolls everything back to where it was before the transaction started. Your money is either in checking or in both accounts. It is never in neither.

In PostgreSQL, you spell this out with three words. BEGIN starts a transaction. COMMIT makes it permanent. ROLLBACK undoes everything since BEGIN. Any SQL you run between BEGIN and COMMIT is treated as a single atomic unit. If anything fails, if any constraint is violated, if you explicitly roll back, every change disappears as if it never happened. The database lies to the rest of the world while you work. Other connections do not see your uncommitted changes. And if you crash before committing, those changes evaporate.

Here is the thing that most vibecoders do not realize. If you just run an INSERT or UPDATE without explicitly typing BEGIN, PostgreSQL wraps it in an implicit transaction automatically. Every single statement is atomic by default. The danger comes when you need multiple statements to be atomic together, and you forget to wrap them in a transaction. Two separate UPDATE statements, one after the other, are two separate transactions. If you crash between them, you get one without the other. The money disappears.

ACID Unpacked

Gray defined three of the four properties. Atomicity, consistency, and durability. Two years later, in nineteen eighty-three, two German computer scientists named Theo Harder and Andreas Reuter published a paper called "Principles of Transaction-Oriented Database Recovery." They added a fourth property, isolation, and coined the acronym that stuck. ACID. Atomicity, Consistency, Isolation, Durability. Four letters that define what separates a database from a spreadsheet saved to a thumb drive.

Let us walk through each one. Not as textbook definitions, but as stories about what goes wrong without them.

Atomicity. We already covered this with the bank transfer. All or nothing. But the real-world examples are more subtle than moving money. Think about creating a user account. You insert a row into the users table, then insert a row into the user_preferences table with their default settings, then insert a row into the audit_log table recording the signup. Three inserts. If the third one fails because the audit table has a constraint violation, you do not want a user with no audit trail. You do not want preferences floating around for a user that was never fully created. All three succeed, or none of them do. Atomicity.

Consistency. This one is the most misunderstood of the four. It does not mean what you probably think it means. Consistency in ACID means that a transaction moves the database from one valid state to another valid state. "Valid" here means all the rules you have defined are satisfied. Foreign keys point to rows that exist. Unique constraints are not violated. Check constraints pass. If your transaction would leave the database in a state where a foreign key points to nothing, PostgreSQL will reject the entire transaction. It is the database enforcing its own rules.

This is different from the C in CAP theorem, which is about distributed systems and whether all replicas agree on the same value. Different concept, same letter, endless confusion.

Isolation. This is where things get genuinely weird. Imagine two people editing the same document at the same time. Without isolation, they would see each other's half-finished work, overwrite each other's changes, and end up with a mess. Isolation means that concurrent transactions appear to execute as if they were running one after another, even though they are actually running simultaneously. Each transaction sees a consistent view of the database, as if it had the whole thing to itself.

The reality is more nuanced than that. True isolation, where every transaction is completely invisible to every other transaction until it commits, is expensive. It means a lot of waiting, a lot of locking, a lot of transactions getting rolled back because they conflicted. So databases offer different levels of isolation, letting you trade strictness for performance. We will get into those levels in a few minutes.

Durability. This is the promise that once the database says "committed," the data survives. Power failure. Disk crash. Kernel panic. It does not matter. If the database told you the transaction committed, the data is on disk. In PostgreSQL, this is implemented through the write-ahead log. Before any change is made to the actual data files, a record of the change is written to the WAL, a sequential log on disk. If the system crashes, PostgreSQL replays the WAL on startup and recovers every committed transaction. The WAL is the journal. The data files are the book. You write in the journal first, then update the book. If you lose the book, the journal can reconstruct it.

Gray was obsessed with durability because he spent years at Tandem Computers building systems that could not go down. Tandem's whole business was fault-tolerant transaction processing for banks, stock exchanges, and telephone switches. Systems where "we lost your transaction" was not an acceptable answer. The NonStop SQL database that Gray helped architect at Tandem was one of the first to guarantee that a committed transaction would survive any single hardware failure. Not because it was clever. Because the math of durability demanded it.

The Man Who Tamed Transactions

Jim Gray did not just define transaction properties. He built the systems that proved they could work.

He was born in San Francisco in nineteen forty-four. His family moved to Rome when he was young, and he learned Italian before English. After his parents divorced, he returned to San Francisco with his mother. He entered UC Berkeley in nineteen sixty-one, earned a degree in engineering mathematics, and then became the first person to receive a PhD from Berkeley's Computer Science Department in nineteen sixty-nine. His dissertation was on parsing theory. Nothing to do with databases. He fell into transaction processing almost by accident.

At IBM in the early nineteen seventies, Gray joined the System R project, the research effort that became the foundation for every SQL database in existence. This is where he did the work that would eventually earn him a Turing Award. He figured out how to let multiple users access the same database simultaneously without corrupting it. That sounds obvious now, but in nineteen seventy-three it was an unsolved problem. Computers were expensive. You could not give everyone their own copy of the database. Multiple users had to share, and sharing meant conflicts, and conflicts meant corrupted data. Gray's insight was granular locking, a system where the database locks only the specific rows or pages being modified, not the entire table. This let dozens of users work concurrently on different parts of the same table without interfering with each other.

He moved to Tandem Computers in nineteen eighty, where he spent a decade building fault-tolerant database systems. Then Digital Equipment Corporation. Then in nineteen ninety-five, Microsoft recruited him to help establish a research lab in San Francisco. He became a Microsoft Technical Fellow, one of the company's most senior technical positions, and spent his last decade working on what he called eScience, the idea that all the world's scientific data should be online and accessible. He built TerraServer, one of the first platforms to serve satellite imagery over the web. He built SkyServer, which let astronomers query the Sloan Digital Sky Survey.

[excited] In nineteen ninety-eight, he received the Turing Award, the Nobel Prize of computing, for seminal contributions to database and transaction processing research and technical leadership in system implementation. The citation specifically mentioned granular locking, two-phase commit, and his work proving that these techniques were mathematically correct. He did not just build systems. He proved they were right.

The goal is to put all the world's scientific data online, along with tools to analyze the data.

That was Jim Gray at Microsoft Research, describing what he wanted to do next. He was sixty-three and still pushing. Then he sailed to the Farallon Islands. But we will come back to that.

Rabbit Hole: How MVCC Lets Everyone Read at Once

This next section goes deep into how PostgreSQL actually implements isolation without making everyone wait in line. If you just want to know the practical isolation levels and when they matter, skip ahead to the next chapter. But if you have ever wondered how a database lets a hundred users read and write the same table at the same time without locking anyone out, this is how.

[excited] The technique is called MVCC. Multi-Version Concurrency Control. And the core idea is almost childishly simple. Instead of locking a row while someone is modifying it, the database keeps multiple versions of the row. Readers see the old version. The writer creates a new version. Nobody waits. Nobody blocks. Readers never block writers. Writers never block readers.

Here is how PostgreSQL does it specifically. Every row in a PostgreSQL table has hidden system columns. Two of the important ones are xmin and xmax. xmin is the transaction ID that created this row version. xmax is the transaction ID that deleted or replaced this row version. When you INSERT a row, PostgreSQL writes a new tuple with your transaction ID as the xmin and an empty xmax. When you UPDATE a row, PostgreSQL does not modify the existing tuple. It marks the old tuple's xmax with your transaction ID and creates a brand new tuple with the updated values and your transaction ID as the xmin. The old version and the new version coexist in the same table. When you DELETE a row, PostgreSQL just sets the xmax on the existing tuple. The row is not physically removed. It is just marked as dead.

This is radically different from how Oracle and MySQL handle it. In Oracle, when you update a row, the database modifies the row in place and writes an undo record to a separate tablespace. If another transaction needs to see the old version, Oracle reconstructs it from the undo log on the fly. MySQL's InnoDB engine works similarly. They modify in place and keep receipts. PostgreSQL keeps all the versions and cleans up later. Neither approach is objectively better. They are different engineering tradeoffs.

The cleanup part is where PostgreSQL gets interesting. All those old row versions, the ones that no active transaction needs to see anymore, are called dead tuples. They take up space. On a table with heavy write activity, dead tuples accumulate and the table grows larger than it needs to be. This is called bloat. Left unchecked, bloat makes your table slower because sequential scans have to read through dead rows, and your disk fills up with data that nobody can see.

The solution is VACUUM. It is PostgreSQL's garbage collector. VACUUM scans the table, identifies dead tuples that no active transaction can see, and marks that space as available for reuse. It does not return the space to the operating system, not unless you run VACUUM FULL, which rewrites the entire table and requires an exclusive lock. Regular VACUUM just frees space internally for future inserts.

PostgreSQL runs autovacuum in the background by default. It monitors every table and kicks in when enough dead tuples accumulate, roughly when twenty percent of the rows are dead or fifty rows have changed, whichever is larger. For most workloads, autovacuum handles everything invisibly. You never think about it. For high-write workloads with millions of updates per hour, you might need to tune the autovacuum settings to be more aggressive. But that is a topic for a later episode.

[calm] The beauty of this system is what it gives you as a user. You can run a complex analytical query that takes five minutes to execute, and it will see a perfectly consistent snapshot of the database as it existed when the query started. Meanwhile, a hundred other users can be inserting and updating rows in the same tables. Your query does not block them. They do not corrupt your results. You each see your own consistent version of reality. That is MVCC. That is isolation in practice.

The Four Levels of Trust

The SQL standard defines four isolation levels. Think of them as four different answers to the question: how much can other transactions interfere with mine?

The loosest level is READ UNCOMMITTED. At this level, a transaction can see changes made by other transactions that have not committed yet. This is called a dirty read. You see data that might not be real, because the other transaction might roll back. It is like reading someone's rough draft before they decide whether to publish it. PostgreSQL does not even bother implementing this. If you ask for READ UNCOMMITTED, PostgreSQL silently gives you READ COMMITTED instead. The PostgreSQL developers decided that dirty reads are never worth the risk.

READ COMMITTED is the PostgreSQL default. And it is what most applications run on. At this level, each statement within your transaction sees only data that was committed before that statement began. Not before the transaction began. Before the statement began. This distinction matters. If you run two SELECT queries in the same transaction, and another transaction commits a change between them, the second SELECT will see the new data. Your two queries see different versions of reality, even though they are in the same transaction. This is called a non-repeatable read, and at the READ COMMITTED level, it is considered acceptable.

For most web applications, this is fine. Each request runs a few queries, gets its data, sends a response. The odds of a meaningful conflict in that window are low. But for applications that need to make decisions based on multiple reads, such as checking a balance and then withdrawing based on that balance, READ COMMITTED has a gap. Between the read and the write, someone else might change the data.

REPEATABLE READ is the next step up. At this level, the entire transaction sees a snapshot of the database as it existed when the transaction started. Not when each statement started. When the transaction started. All your SELECTs within that transaction see the same data, no matter how long the transaction runs or what other transactions do in the meantime. PostgreSQL's implementation is actually stronger than what the SQL standard requires. The standard allows phantom reads at this level, where new rows that match your query criteria appear because another transaction inserted them. PostgreSQL prevents phantom reads too.

The tradeoff is that if your transaction tries to modify a row that another transaction has already modified and committed since your snapshot was taken, PostgreSQL will abort your transaction with a serialization error. You have to catch that error in your application code and retry the transaction.

SERIALIZABLE is the strictest level. It guarantees that the result of running transactions concurrently is identical to running them in some serial order, one after another. PostgreSQL uses a technique called Serializable Snapshot Isolation, which monitors transactions for patterns that could lead to anomalies and aborts one of them if it detects a conflict. This is the safest level, but it comes with the most retries. Your application must be prepared for transactions to be aborted and must have retry logic built in.

Here is the practical reality. Most applications run at READ COMMITTED and never have a problem. If you are building something where correctness is worth more than throughput, like a financial system, REPEATABLE READ or SERIALIZABLE is worth the overhead. The key insight is that stricter isolation does not mean "better." It means more transactions get rolled back and need to be retried. It is a tradeoff between safety and throughput, and the right answer depends on what your data is worth.

Deadlock at the Dinner Table

In nineteen sixty-five, Edsger Dijkstra, one of the great figures in computer science, posed a thought experiment. Five philosophers sit around a circular table. Between each pair of philosophers is a single fork. To eat, a philosopher needs two forks, the one to their left and the one to their right. Each philosopher picks up one fork and then reaches for the second. If all five simultaneously pick up the fork to their left, they are all holding one fork and waiting for the fork to their right. Nobody can eat. Nobody will put down their fork. They sit there forever.

[serious] This is a deadlock. And it happens in databases all the time.

Here is how it works with PostgreSQL. Transaction A updates row one. Transaction B updates row two. Now Transaction A tries to update row two. It cannot, because Transaction B holds a lock on it. So it waits. Meanwhile, Transaction B tries to update row one. It cannot, because Transaction A holds a lock on it. So it waits. They are both waiting for each other. Neither can proceed. The philosophers are hungry and nobody is eating.

PostgreSQL handles this automatically. It runs a deadlock detection algorithm after a configurable timeout, one second by default. When it detects the circular dependency in the wait-for graph, it picks a victim. One of the transactions gets terminated with an error message: deadlock detected. The victim's changes are rolled back, which releases its locks, which allows the other transaction to proceed. It is not elegant, but it works. The alternative, letting both transactions wait forever, is worse.

The vibecoder angle on deadlocks is this. AI-generated code almost never thinks about lock ordering. If your application updates rows in tables A and B, and different code paths update them in different orders, you will eventually get a deadlock in production. The fix is simple in theory: always acquire locks in the same order. Update table A first, then table B. Every code path, every time. In practice, this is hard to enforce across a large codebase, especially when the AI writes each function independently without awareness of the global lock ordering.

The Vibecoder's Transaction Blind Spot

Here is what AI gets wrong about transactions, and it is the same thing most junior developers get wrong. It does not think about transaction boundaries.

You ask your AI assistant to write an endpoint that creates an order. The order has line items. Each line item needs to update the inventory count. The AI generates code that inserts the order, then loops through the line items, inserting each one and updating the inventory count. Clean, correct, readable code. But there is no explicit transaction. Each INSERT and UPDATE is its own implicit transaction. If the server crashes after inserting three of five line items, you have a partial order in the database. Three line items exist. Two do not. The inventory is decremented for three items but the order total reflects five.

This is not a theoretical problem. It happens in production when you have concurrent users and background processes and network hiccups and out-of-memory kills and all the other things that make real systems different from toy apps. The fix is one word at the beginning and one word at the end. BEGIN and COMMIT. Wrap the entire operation in a transaction. If anything fails, everything rolls back to a consistent state.

[sigh] Works fine on my machine. Works fine in testing. First week in production with real traffic, we have orphaned line items everywhere. Customer support is drowning. Turns out two separate UPDATE statements are not the same as two UPDATE statements inside a transaction.

The other blind spot is race conditions. Two users add the last item to their carts at the same time. Both read the inventory count, both see one remaining, both decrement it. Now the inventory is negative one. The item has been sold twice. This is a lost update, one of the classic concurrency bugs that transactions are designed to prevent. But only if you use the right isolation level or explicit locking. At READ COMMITTED, both transactions can read the same value and both can write, and the second write overwrites the first without anyone noticing.

The defense is SELECT FOR UPDATE. This tells PostgreSQL to lock the row you are reading, preventing other transactions from modifying it until you commit. Your AI will rarely generate this. It generates SELECT. It generates UPDATE. It does not generate the lock-acquiring read that prevents the race condition between them.

The parkit database on the production VPS is a perfect example of why this matters. Nine services share one PostgreSQL database. When Capture creates a task that is linked to a Focus session and a Time calendar event, three tables need to be updated atomically. If any one of those writes fails, you want all three to roll back. Without a transaction wrapper, you could end up with a task that references a Focus session that does not exist, or a calendar event that is not connected to anything. The transaction makes it all or nothing. Nine services, one database, zero tolerance for partial writes.

The Last Search

[sad] Let us go back to January two thousand seven. Jim Gray did not come home from the Farallon Islands.

The Coast Guard searched for four days. Planes, helicopters, boats. They found nothing. No wreckage. No debris. No life raft. The forty-foot Tenacious had vanished from one of the most heavily monitored stretches of ocean on the west coast.

What happened next was unprecedented. Gray's colleagues in the tech world, people from Microsoft, Google, Amazon, and a dozen universities, launched a civilian search-and-rescue operation unlike anything that had been attempted before. On February first, the DigitalGlobe satellite made a special pass over the area and captured thousands of high-resolution images. Those images were uploaded to Amazon's Mechanical Turk platform, and volunteers around the world, students, fellow scientists, complete strangers, sat at their computers and scanned them for any sign of a red-hulled sailboat.

[serious] Computer scientists around the world formed a Jim Gray Group to study these images for clues. It was the largest crowdsourced search-and-rescue effort anyone had ever seen. The man who helped build the systems that organize the world's data was now the subject of the world's most advanced search.

By February sixteenth, the image search was suspended. No trace found. Gray's wife, Donna Carnes, hired a team with side-scanning sonar to map three hundred square miles of ocean floor along the twenty-seven-mile path between the Golden Gate Bridge and the Farallon Islands. The sonar found several sunken vessels of the right size. An underwater robot with a camera inspected each one. None of them were Tenacious.

On January twenty-eighth, two thousand twelve, exactly five years after he sailed out of San Francisco Bay, a court declared Jim Gray missing and presumed dead. <break time="1s"/>

His legacy is everywhere. The SIGMOD conference gives an annual Jim Gray Doctoral Dissertation Award. Microsoft Research gives an annual Jim Gray eScience Award. Every database you have ever used implements the transaction model he defined. Every time PostgreSQL wraps your INSERT in an implicit transaction, every time it writes to the WAL before touching the data files, every time it rolls back a failed operation as if it never happened, it is executing ideas that Jim Gray spent forty years refining.

He wanted to put all the world's scientific data online. He spent his career ensuring that data, once committed, would never be lost. And then he sailed into the Pacific and became lost himself. [slow] There is no clean ending to this story. The transaction was never committed. The Tenacious was never found. The man who gave databases their guarantees received none from the sea.

The Bridge

Your data has guarantees. Atomicity. Consistency. Isolation. Durability. Four properties that were just academic concepts before Jim Gray proved they could be implemented, before Harder and Reuter gave them a name, before PostgreSQL baked them into every single statement you run. You are standing on decades of work every time you type INSERT INTO and trust that it will be there tomorrow.

But most of what we have talked about so far has been structured data. Rows and columns. Tables with schemas. Everything in its place. What happens when your data does not fit neatly into rows? When you need to store a JSON blob, a configuration object, a nested structure that resists the relational model? Next time, we confront the schemaless temptation.