Databases: Tables All the Way Down

Introduction: Your Databases Are Embarrassed By You

Par Boman has eighteen databases running in production right now. Eighteen. He has PostgreSQL instances on a VPS in Paris. He has SQLite files scattered across his projects like digital breadcrumbs, a trail of tiny databases he leaves behind every time he builds something new. He has a DuckDB lurking inside one of his investigation tools. He has a Redis somewhere that is doing something, and if you asked him what exactly that Redis is doing, he would give you a confident answer that would be approximately forty percent correct.

He uses these databases every day. He writes queries against them. He joins tables across them. He has opinions about which ones are fast and which ones are annoying. He knows his connection strings by heart, the way a person knows their phone number, through repetition rather than understanding. And if you held him at gunpoint and asked him to explain what a primary key fundamentally is, the mathematical reason it exists, the decades of academic warfare that produced the rules governing what it should and should not do, he would give you a look that said "I have always gotten by without knowing this and I resent that you are making it a problem now."

This is not a criticism of Par. This is a description of almost every working developer who has ever shipped something real. We use databases the way we use electricity. Confidently. Constantly. With absolutely no mental model of what is happening on the other side of the wall socket. We flip the switch and the lights come on. We type SELECT and the data comes back. The transformer on the street corner doing the actual physics is someone else's problem. The query planner doing the actual mathematics is someone else's problem. We are fine with this arrangement.

But here is the thing about electricity. The people who invented it argued ferociously about how it should work. They made spectacular blunders. They stumbled into compromises that haunt us to this day. They occasionally tried to ruin each other professionally and personally. And the story of that arguing, those blunders, and those compromises is one of the most entertaining stories in the history of technology.

Databases have exactly the same story. Maybe better. Because the database story has Larry Ellison in it, and Larry Ellison makes every story worse in the most entertaining possible way.

This is part one of a series about the history of databases. We are going to start with the filing cabinet and end somewhere in the late nineteen nineties with Oracle running the world, an open source ecosystem quietly getting ready to eat its lunch, and a Swedish programmer naming databases after his children. We will meet a British mathematician who changed computing with thirteen pages. We will meet a man from New York who got spectacularly rich by reading that mathematician's published research before the mathematician's own employer did anything useful with it. We will watch IBM, the largest technology company in the world, create the future and then fumble it so thoroughly that a company of four people with two thousand dollars beat them to market.

And somewhere in all of that is a direct explanation of why Par has eighteen databases, and why almost none of them store data the way anyone originally planned.

The Filing Cabinet Problem

To understand why databases were invented, you need to feel what the world was like before them. Not understand it intellectually. Feel it. Because the frustration is the point.

It is nineteen sixty-two. You work for a company that uses a computer. This is already remarkable, because computers at this point are large, expensive, temperamental machines that live in glass-walled rooms with raised floors and dedicated air conditioning. Your company has one. Maybe two. You are one of a dozen people who know how to use them, and you are aware that this makes you simultaneously indispensable and deeply irritating to the people who sign your paychecks.

Data on this computer is stored on magnetic tapes and disk packs. Each program your company runs, the inventory system, the payroll system, the customer records system, each one stores its data in its own files, in its own format, designed by its own programmer who made perfectly sensible decisions that made complete sense at the time and have now been forgotten because that programmer left two years ago and his documentation was, charitably, incomplete.

The inventory system stores customer names as last-name-comma-first-name. The accounting system stores them as first-name-space-last-name. The customer records system stores them in all capitals because the programmer who wrote it had a strong opinion about character encoding that nobody else shared. These three systems cannot talk to each other. If the sales team wants a report that crosses inventory data with customer data, someone has to write a program to read both files, understand both formats, reconcile the differences, and produce the output. That program is a one-off. Nobody will reuse it. It will be thrown away when the report is done and the next time someone wants a similar report they will start from scratch.

Now your boss decides to add a new field to the customer records. Every customer record now needs to store a preferred contact time, because the salespeople have been complaining. Simple change. Except. Every program that reads that file now breaks. The field positions have shifted. The record length has changed. The pointer that used to point to the zip code now points to the middle of someone's phone number. You are going to spend the next three months fixing programs that worked perfectly yesterday, and when you are done, someone else will want to add another field, and you will start over.

This is what data dependence means in practice. Your software and your data structure are not separate things. They are the same thing. The program knows exactly where every piece of data lives, the way a secretary knows the filing cabinet. Second drawer, third folder, fourth tab. Change the filing system and everything the secretary knows becomes wrong. Refile everything by hand and retrain every person who used the old system. There is no shortcut. There is no workaround. Every change to how data is stored cascades through every piece of software that reads it.

This was the normal condition of enterprise computing in the nineteen sixties. Not a failure. Not a crisis. Just the way things were. The programmer was the navigator. Every good programmer had a mental map of the data structures they worked with, learned over months of reading code and asking questions and making mistakes. New programmer joins the team? Hand them the documentation, hope it is current, and wait six months for them to stop breaking things.

Some systems tried to help. And the most ambitious of them was born from one of the most spectacular engineering problems anyone had ever tried to solve.

Houston, We Have a Database Problem

NASA wanted to go to the moon. The Saturn V rocket that would take them there had somewhere around three million parts. Those parts came from hundreds of different suppliers, each with their own manufacturing processes, quality standards, and delivery schedules. You needed to know, at any moment, where a specific bolt was, who made it, what lot it came from, whether it had been inspected, and whether the batch it belonged to had passed quality control. Because if one bolt failed at the wrong moment, astronauts died.

In nineteen sixty-five, IBM partnered with North American Aviation and Caterpillar Tractor to build a system that could track all of this. They called it the Information Management System. Everyone called it IMS. On August fourteenth, nineteen sixty-eight, at Rockwell's Space Division facility in Downey, California, the first prompt appeared on an IBM 2740 typewriter terminal. The word that appeared on the paper printout was READY. The engineers in that room knew they had built something new. They had a system that could organize millions of records and let programs navigate through them by following defined paths. This was a genuine breakthrough.

IMS organized data as a hierarchy. A tree. A parent record could have many child records, and a child record had exactly one parent. Think of an organization chart. A department has employees. An employee has pay records. A pay record has deductions. To get information out of this tree, you navigated it. You told the system to go to this segment, find the child records, follow this pointer, move to the next sibling. The programmer held a complete mental map of the entire structure and moved through it step by step, like steering a submarine through a known channel. You knew the water. You knew the rocks. You knew the turns.

Charles Bachman at General Electric had built an earlier system called IDS in nineteen sixty-three, which extended the idea. Instead of a strict tree, IDS allowed a record to participate in multiple relationships. An employee could belong to several projects at once. This was called the network model, and it was developed further by a standards body called CODASYL, the same committee that had created the COBOL programming language. More flexible than a hierarchy. Also more complicated to navigate. More pointers to follow. More structure to hold in your head.

In nineteen seventy-three, Bachman won the Turing Award, computing's closest equivalent to a Nobel Prize. His acceptance lecture was titled "The Programmer as Navigator." He was proud of this metaphor. The programmer steering through the data, mastering the structure, commanding the machine at a detailed level. To Bachman, this was expertise. This was craft. The programmer as a ship's officer, not a passenger.

And then the problem arrived. Not dramatically. Not in a single crisis. Gradually. As business requirements changed, as they always do.

You need to add a new relationship. An employee can now belong to multiple departments, not just one. To add this to an IMS hierarchy, you have to restructure the tree. Restructuring the tree means every program that navigates that tree has to be rewritten. Every pointer. Every traversal. Every segment that relied on the old structure. You cannot add a relationship without breaking everything that knew the old structure. And the bigger and more successful the system becomes, the more programs depend on it, and the more expensive every change gets.

This was the wall. You could build a system with IMS that worked beautifully for the problem you had on the day you designed it. But the moment the world changed, and it always changed, the cost of adapting was enormous. Not a bug. Not a failure of implementation. A fundamental property of the model. The navigator metaphor was the problem. If you have to steer through data, you have to know exactly where everything is. And knowledge of exactly where everything is breaks the instant the data moves.

Someone was about to notice this. And he was going to fix it in thirteen pages.

One Paper to Rule Them All

Edgar Frank Codd was born on August nineteenth, nineteen twenty-three, on the Isle of Portland, a small limestone peninsula jutting into the English Channel off the Dorset coast. His father was a leather manufacturer. His mother was a schoolteacher. He went to Poole Grammar School, then to Exeter College at Oxford, where he studied mathematics and chemistry. When the Second World War broke out, he joined the Royal Air Force Coastal Command and flew Sunderland flying boats, big four-engine maritime patrol aircraft that hunted German U-boats over the Atlantic. Flight lieutenant. The kind of job where mathematics is not abstract.

After the war, he finished his degree, moved to the United States, and went to work for IBM. He completed a doctorate in mathematics at the University of Michigan. By nineteen sixty-eight he was at IBM's San Jose Research Laboratory in California, and he had been watching the problems with hierarchical and network databases accumulate with the quiet frustration of a mathematician watching people do arithmetic by counting on their fingers.

His insight was this. If you treated data as sets of tuples, which is mathematician-speak for rows of values arranged in tables, you could apply the tools of set theory and predicate logic to them. A table of customers is a set. A table of orders is a set. The relationship between them is a join, a mathematical operation on sets. And the beautiful property of mathematical sets is that operations on sets do not care how the sets are stored. You can ask for all orders from customers in Stockholm without knowing whether orders are stored on the left side of the disk or the right side, in tree form or flat form, sorted or unsorted. The storage is irrelevant. The mathematics works regardless.

You describe what you want. The machine figures out where it lives.

He published this idea in June of nineteen seventy. The paper was called "A Relational Model of Data for Large Shared Data Banks," published in Communications of the ACM. Thirteen pages. Technically dense, full of formal notation, the kind of paper that is easy to skim and hard to understand. The kind of paper that changes everything.

The key concept was data independence. Your application should not have to know how data is physically stored. You should be able to reorganize the storage, add columns to a table, change indexes, move things around for performance reasons, and your application should continue to work unchanged. The database engine's job was translation, not navigation. It would take your logical description of what you wanted and figure out the physical operations needed to retrieve it. The programmer was no longer the navigator. The programmer was the passenger describing the destination, and the engine drove.

The relational model provides a means of describing data with its natural structure only, that is, without superimposing any additional structure for machine representation purposes.

This was not just a technical improvement. It was a philosophical revolution. Codd was separating the logical model of data from the physical model. Separating what the data means from how it is stored. This is one of the deepest and most productive ideas in all of computer science, the idea that you can build clean abstractions that hide messy implementation details. Every layer of modern software, from operating systems to web frameworks to the ORM that Par uses without thinking about it, rests on some version of this principle. And Codd stated it with mathematical precision in nineteen seventy.

He won the Turing Award in nineteen eighty-one for this work. The paper that won him the award was already eleven years old. It had been changing the world the entire time.

IBM's Most Impressive Fumble

Now here is where the story becomes both instructive and deeply entertaining.

IBM saw Codd's paper. IBM employed Codd. IBM was also selling IMS very profitably to very large customers who had invested millions in building systems on it. A relational database, if it worked, would make IMS obsolete. Nobody at corporate IBM was in a particular hurry to make that happen.

What followed was not a villain ordering Codd's ideas suppressed. Corporate sabotage in real life is rarely that theatrical. What happened was subtler and more typical of large organizations everywhere. The priorities were slightly wrong. The urgency was slightly insufficient. The team that eventually got put in charge of building something was not the one most deeply immersed in Codd's actual thinking. The funding was always adequate but never generous. The timeline was always reasonable but never aggressive.

IBM launched a research project called System R in nineteen seventy-four. Its job was to prove the relational model could work in practice. The team was talented. But Codd, who had invented the thing, was not central to the effort. The research team built their own query language called SEQUEL, for Structured English Query Language, developed by two researchers named Don Chamberlin and Ray Boyce.

Codd had proposed his own query language called Alpha, based directly on the relational algebra from his paper. The System R team looked at Alpha, decided it was too mathematically abstract for practical use, and built SEQUEL instead. SEQUEL was designed to look like English, to be readable by non-specialists, to feel intuitive rather than mathematical. This was a deliberate choice, and it was probably the right one for adoption. But it came with compromises that Codd would spend the rest of his career fighting against.

SEQUEL allowed duplicate rows in a table. This violated the mathematical model entirely. A set, in mathematics, cannot have duplicates. That is part of the definition of a set. SEQUEL handled missing data with a concept called NULL, which introduced a three-valued logic where every comparison could be true, false, or unknown. This created confusing edge cases that still produce subtle bugs in production software today, more than fifty years later. These were not minor details. They were decisions that went against the core theory and created problems that have never been fully resolved.

There was also a trademark problem. It turned out that SEQUEL was already a registered trademark of Hawker Siddeley Dynamics Engineering, a British aerospace company that built missiles and had nothing to do with databases. The vowels were dropped. SEQUEL became SQL, which officially stands for Structured Query Language and which everyone still pronounces "sequel" anyway, because language does not care about trademarks.

Ray Boyce, who co-designed the language, died of a cerebral aneurysm at twenty-six years old in nineteen seventy-four, the same year System R began. He did not live to see what he had helped create become the most widely used query language in the world. He did not live to hear it pronounced either way.

IBM's commercial product based on this research, SQL/DS, did not ship until nineteen eighty-one. DB2 came in nineteen eighty-three. Thirteen years after Codd's paper. And here is the part that still makes database historians shake their heads. The System R research papers were published publicly. In academic journals. Anyone could read them. Anyone could see exactly what IBM had built and how it worked.

Anyone.

Two Guys Read IBM's Homework

The year is nineteen seventy-seven. Larry Ellison is thirty-three years old, between jobs again, and has been spending a remarkable amount of time reading the IBM Journal of Research and Development.

Ellison was born in New York City in nineteen forty-four to an unwed mother. He was adopted at nine months by his aunt and uncle and raised in a middle-class Jewish neighborhood on the South Side of Chicago. He enrolled at the University of Illinois at Urbana-Champaign but dropped out after his adoptive mother died. He enrolled at the University of Chicago but dropped out after one semester. He moved to California and drifted through a series of programming jobs, becoming technically skilled but organizationally restless. Two college dropouts. Not a record that suggests the second-richest person in America, but then again, the record rarely does.

At Ampex Corporation in the early nineteen seventies, he met two programmers named Bob Miner and Ed Oates. Miner was a gentle, meticulous engineer who actually liked building things to work correctly. Oates was a capable developer with a head for business. Ellison was the salesman and the dreamer, the one who read industry journals and saw possibilities where other people saw academic papers. In nineteen seventy-seven, the three of them pooled two thousand dollars, twelve hundred from Ellison, and founded a company called Software Development Laboratories.

Their first customer was the Central Intelligence Agency, for a project code-named Oracle. When the product was ready to sell more broadly, Ellison asked the CIA if he could borrow the project name. The CIA said yes. The company would go through several name changes over the years, from Software Development Laboratories to Relational Software Incorporated to Oracle Systems Corporation, but the product was always Oracle.

The founding inspiration was explicit and almost comically brazen. Ellison had read Codd's nineteen seventy paper. He had read IBM's published System R research. He had read, in IBM's own journals, a detailed description of a relational database that IBM had built and tested and proven could work. And he had noticed that IBM did not actually sell one. The largest technology company in the world had invented the future, published the blueprints, and then gone back to selling the present because the present was profitable.

Ellison believed he could build a commercial relational database before IBM shipped theirs. He was right. Oracle Version Two shipped in nineteen seventy-nine. There was no Version One. This was a deliberate marketing lie.

We thought you had to be half nuts to buy database software from four guys in California anyway. When it's version one, that's just impossible. So the very first version came out as Version Two.

This tells you almost everything you need to know about Larry Ellison. He looked at the problem, he looked at what IBM had published, he built the product, and then he lied on the box because the truth was bad for sales. Oracle Version Two was the first commercial relational database ever sold. It ran on DEC minicomputers that were becoming the dominant machines in research labs and mid-sized companies. IBM's SQL/DS would not appear for another two years. DB2 would not appear for another four.

A company of four people, funded with two thousand dollars, had used a competitor's publicly available research to beat that competitor to market by years. Larry Ellison has never been given adequate credit for how audacious this was. Possibly because giving him credit for things is something the universe has learned to avoid.

The Empire Builds Itself

Through the nineteen eighties, Oracle grew faster than almost any software company in history. The relational model proved to work in practice. SQL turned out to be something that non-specialists could learn, or at least learn enough of to be dangerous. Every business that had been managing data with IMS hierarchies or custom file formats wanted to move to relational databases, and Oracle was ready with a product while IBM was still wrestling with its installed base of IMS customers who did not want to hear that everything they had built was obsolete.

In nineteen eighty-six, the American National Standards Institute published the first SQL standard. The International Organization for Standardization followed in nineteen eighty-seven. Oracle had pushed hard for standardization, because they were already SQL-compatible. Having an international standard meant customers knew SQL was not a proprietary gamble, not a bet on a single vendor. The standard benefited Oracle directly because Oracle already met it. IBM, which had invented the underlying technology, found itself in the position of implementing a standard partly shaped by the company that had read its homework.

Larry Ellison's personality during this era fills multiple biographies, several of them with titles that seem designed to be quoted at his expense. One is called "The Difference Between God and Larry Ellison: God Doesn't Think He's Larry Ellison." Another is called "Everyone Else Must Fail: The Unvarnished Truth About Oracle and Larry Ellison." Ellison adopted that second title as something close to a corporate philosophy. He did not merely want Oracle to succeed. He wanted everyone else to fail. He said this openly, and the people around him believed he meant it.

At one point in the nineteen nineties, during the browser wars between Microsoft and Netscape, Ellison hired private investigators to go through the trash of an industry advocacy group called the Independent Institute, which had been running advertisements questioning the antitrust prosecution of Microsoft. Ellison suspected they were secretly funded by Microsoft. He wanted proof. So he sent people to dig through their garbage, looking for documents, receipts, financial records, anything incriminating. When a reporter confronted him about this:

A public service.

He described it as a public service. He was not joking. Or rather, he was joking, but he also meant it, which is the specific Larry Ellison quality that makes him so difficult to write about without sounding like you are making things up.

And then, in nineteen ninety, Oracle nearly died.

The company had been growing so fast that it had outrun its own accounting. Revenue recognition practices that had been aggressive became, under scrutiny, fraudulent. Oracle had been booking revenue for deals that had not actually closed. When the truth came out, the company was forced to restate its financial results. The stock price dropped eighty percent. Half the workforce was laid off. Class action lawsuits piled up. Oracle, the company that had seemed destined to dominate enterprise software forever, was suddenly fighting for survival.

This period is almost never discussed in the Oracle mythology. Larry Ellison does not like to talk about it. The biographies cover it, but the company's own narrative jumps from scrappy startup to global domination with the nineties crisis edited out. It nearly killed them. They survived partly through a revolving credit line that kept the lights on long enough for the sales pipeline to recover, and partly because the customers who depended on Oracle's database genuinely could not switch to anything else fast enough to matter. The lock-in that would later become Oracle's most criticized business practice was, in nineteen ninety, the thing that saved them from bankruptcy.

The nineteen nineties saw Oracle become the database for serious business applications. Banks ran on Oracle. Airlines ran on Oracle. Governments ran on Oracle. Hospitals ran on Oracle. If your enterprise needed to store data and retrieve it reliably and you could not afford to get it wrong, you bought Oracle. The alternative was to explain to your board why you went with something cheaper when the system that runs the business fails. Nobody wanted to have that conversation.

The Oracle Tax

Oracle's licensing model evolved alongside this dominance into something that a charitable person might call baroque and an uncharitable person might call a protection racket.

The structure works like this. You pay an initial license fee for the right to run the software. You then pay an annual support and maintenance fee of twenty-two percent of that initial license cost. Every year. Forever. The support fee covers patches, security updates, and the right to call Oracle when something breaks. These fees are not negotiable downward in any practical sense. They compound at around eight percent per year through various escalation clauses in the contract.

A large enterprise database license might cost five million dollars. Twenty-two percent of that is one point one million per year. After five years you have paid the original license plus five and a half million in support, for a total of ten and a half million, and you are still paying. The only alternatives are to stop paying support, which means you stop getting security patches and become a compliance liability, or to migrate off Oracle entirely, which for a complex enterprise database with decades of stored procedures and custom integrations is a project that can cost more than the license fees themselves.

Our customers understand the different components of value that they get. And customers have really no choice but to keep different technology current, and to pay us for that ongoing support.

That was Safra Catz, Oracle's president, explaining to financial analysts why support revenue was so durable. She used those words. Customers have no choice but to pay. This was not a slip. It was an accurate description of the business model, delivered to investors as a feature, not a bug.

If you have worked in enterprise software and you have never heard the words Oracle License Management Services, you have been very lucky or you have worked somewhere very small. Oracle reserves the right to audit your use of their software at any time with relatively short notice. Their auditors examine how you are running Oracle and determine whether you are in compliance with your license agreement.

The license agreement is complicated. The rules around virtual machines are particularly complicated. Oracle's official position is that if Oracle software is capable of running on a physical server, you must license all of the processor cores on that physical server, even if Oracle is actually using only a small fraction of them inside a virtual machine. This position, applied aggressively, produces spectacular claimed compliance gaps.

Documented cases exist of companies receiving claimed compliance deficits of twenty-seven million dollars that, after negotiation and technical review by independent Oracle licensing experts, settled for fifty thousand dollars. That is a reduction of ninety-nine point eight percent. The gap between Oracle's opening claim and the amount that actually got paid was manufactured almost entirely by aggressive interpretation of ambiguous license terms. An entire consulting industry exists for the sole purpose of defending companies from Oracle audits. These consultants know the license terms the way tax lawyers know the tax code. They are expensive. They are considered worth every penny.

The PeopleSoft Wars

The year is two thousand and three. Oracle is at the top of the enterprise software world. Database king. Application server provider. ERP vendor. The company that once ran on the creative energy of three people and two thousand dollars now employs fifty thousand. Larry Ellison is one of the richest people on earth, and he is about to do something that even by his standards is breathtaking.

PeopleSoft makes human resources and financial software. Large companies use it to manage their employees, their payrolls, their benefits administration. PeopleSoft has a large installed base and a corporate culture famously opposite to Oracle's. Where Oracle was aggressive and transactional, PeopleSoft was known for treating customers as long-term partners, for pleasant interactions, for the kind of enterprise relationship where the vendor actually seemed to care whether you were happy. In the enterprise software world, this was unusual enough to be noteworthy.

In June of two thousand and three, PeopleSoft announced it was acquiring JD Edwards, another enterprise software company. Days later, before the JD Edwards deal was even complete, Oracle launched an unsolicited hostile takeover bid for PeopleSoft itself. The timing suggested that Ellison had been watching PeopleSoft grow for a while and had decided that instead of competing with them, he would simply buy them and shut them down.

PeopleSoft's chief executive, Craig Conway, did not take this quietly.

Oracle is a sociopathic company. They want PeopleSoft's maintenance revenue stream. They have no interest in PeopleSoft's products. They have no interest in PeopleSoft's customers. They have no interest in PeopleSoft's employees. They want to destroy us and take the revenue.

Conway was right about every single point, as subsequent events would prove. But being right does not always help.

The fight lasted eighteen months. Oracle raised its bid repeatedly. The United States Department of Justice sued to block the acquisition on antitrust grounds, arguing it would harm competition in the enterprise application market. A federal judge ruled in Oracle's favor in September of two thousand and four. Craig Conway was fired by his own board days later, in a move that many observers interpreted as PeopleSoft clearing the way for a deal it could no longer resist.

In January of two thousand and five, Oracle completed the acquisition for ten point three billion dollars, more than twice its original offer. Within sixty days, Oracle laid off approximately five thousand PeopleSoft employees.

Dave Duffield was sixty-five years old. He had founded PeopleSoft in nineteen eighty-seven. He had built it from nothing into a company that employed thousands of people, many of whom had worked there for years and considered it an unusually good place to be employed. Oracle had just fired them.

Duffield called every laid-off employee he could reach. He told them he was starting something new. He asked if they wanted to be part of it. They called him back. In two thousand and five, Duffield co-founded Workday, a cloud-based human resources software company. Workday eventually went public and became worth over thirty billion dollars. Many of the people Oracle had fired in the PeopleSoft acquisition ended up working there. The revenge was patient. It was thorough. And it was funded by the skills of the people Larry Ellison had discarded.

Codd's Last Stand

While Oracle was building its empire and buying its competitors, the man who had made all of it possible was watching from the sidelines with increasing frustration.

By the early nineteen eighties, the word relational had become marketing. Every database vendor wanted to be relational. Some of them actually were. Some were older systems with SQL slapped on top of hierarchical or network foundations, like putting a Ferrari body on a tractor chassis. Some were new systems that implemented SQL but ignored the underlying mathematical theory that SQL was supposed to express. The word had been emptied of meaning through overuse, the way "natural" on food packaging means nothing.

Edgar Codd was furious.

In nineteen eighty-five, he published what became known as Codd's twelve rules in Computerworld magazine. There were actually thirteen rules, numbered zero through twelve, which was very mathematician of him and probably deliberate. The rules were a litmus test. Did your database actually implement the relational model? Or did it just claim to?

A system is said to be fully relational if and only if it manages databases entirely through its relational capabilities and supports at a minimum all of the features described herein.

Almost nothing passed. Including, to nobody's particular surprise, Oracle.

Three years later, in nineteen eighty-eight, Codd published a paper called "Fatal Flaws in SQL" and made his critique explicit. SQL allowed duplicate rows, which violated set theory. SQL's NULL handling created three-valued logic that violated predicate logic. The query language he had proposed in his original paper, Alpha, based directly on relational algebra, had been completely ignored by the System R team and never implemented by anyone.

The field he had created had moved on without him. They were using a language he considered theoretically compromised. He had invented the underlying mathematics. The industry had taken his mathematics, bent it slightly out of shape for practical reasons, standardized the bent version, and was now calling the bent version the standard. It was as if someone had taken Euclidean geometry, decided that parallel lines could intersect sometimes when it was convenient, and then named the result Euclidean geometry anyway.

Codd resigned from IBM in nineteen eighty-four. He founded the Relational Institute and a consulting group with his colleague Chris Date. He spent his remaining years arguing, with diminishing audiences, that the databases bearing the word "relational" were not truly relational at all.

He died of heart failure on April eighteenth, two thousand and three, in Williams Island, Florida. He was seventy-nine years old. Every database Par has ever used, including the eighteen running on that VPS in Paris right now, traces its lineage directly to the thirteen pages he published in nineteen seventy. The compromises that make Par's queries occasionally return unexpected results when NULL values are involved, those trace to him too, or rather, to the people who did not listen to him.

The Open Source Rebellion

While all of this corporate maneuvering was happening, something quieter and more consequential was growing at Berkeley.

Michael Stonebraker had been a professor at the University of California, Berkeley since the late nineteen sixties. He was, and remains, one of the most opinionated people in computer science. This is not a criticism. In a field where many researchers hedge their statements carefully and qualify their claims with layers of academic caution, Stonebraker has built an entire career on saying exactly what he thinks, being right a remarkable percentage of the time, and visibly enjoying the experience of arguing about it with people who were wrong.

Beginning in nineteen seventy-four, Stonebraker and his colleague Eugene Wong built INGRES, the Interactive Graphics and Retrieval System. INGRES was one of the first practical relational database implementations, developed roughly in parallel with IBM's System R. It eventually won the ACM Software System Award, which it shared with System R, a recognition that both teams had independently proved that Codd's ideas could work in the real world.

Then, in nineteen eighty-six, Stonebraker decided INGRES had fundamental limitations. Instead of improving it incrementally, he started over from scratch. The new system was called POSTGRES, which stood for Post-INGRES, the kind of naming convention that only a deeply confident person deploys. You name your new project "after your old project" when you believe the old project was important enough that the new one's identity can rest entirely on being its successor. POSTGRES introduced object-relational features, extensible types, and new approaches to transaction management. It was funded by the Defense Advanced Research Projects Agency and the National Science Foundation, because in the nineteen eighties the United States government still funded computer science research generously and the results changed the world.

Stonebraker's opinions on the technologies that came after him are worth hearing because they demonstrate a particular kind of earned bluntness that only decades of being right can produce. On MapReduce, the big data processing approach that Google published in two thousand and four and that generated enormous excitement in the software industry:

A giant step backward in the programming paradigm for large-scale data intensive applications, written by people who don't understand databases at all.

That is Stonebraker. He later described the entire NoSQL movement as "No ACID Equals No Interest," which is the kind of dismissal that turns out to be approximately correct five years later. He won the Turing Award in two thousand and fourteen, which in his case reads less like a lifetime achievement award and more like an official certification that he had been right about things for four decades and the field was finally ready to admit it.

In nineteen ninety-four, two graduate students at Berkeley named Andrew Yu and Jolly Chen made a decision that would echo forward through decades of software development. They took POSTGRES and replaced its original query language, called POSTQUEL, with SQL. They called the result Postgres95. In nineteen ninety-six the name was changed to PostgreSQL, which is officially pronounced "post-gres-cue-ell" and which nobody pronounces that way because everyone just says Postgres.

No single company owns PostgreSQL. It is maintained by the PostgreSQL Global Development Group, an international volunteer organization with no controlling shareholder, no venture capital investors, no board of directors that can make business decisions overriding the technical community. When a company needs something changed, they can contribute the change. They cannot buy it. They cannot acquire the project and fire the developers. This structure has proven remarkably stable over three decades, and PostgreSQL has grown, slowly and reliably, into one of the most feature-complete and standards-compliant databases ever built.

Its mascot is a small blue elephant named Slonik, proposed in nineteen ninety-seven by a group of Russian internet pioneers. The elephant because elephants never forget. Which is an excellent quality in a database.

My, Max, and Maria

Meanwhile, in Sweden, a programmer named Michael Widenius was working on his own database problem.

Widenius, who went by the nickname Monty, had been involved with database software since the early nineteen eighties. In nineteen ninety-four, working with colleagues David Axmark and Allan Larsson, he began developing a new relational database system. The first release came on May twenty-third, nineteen ninety-five. He named it MySQL.

The "My" in MySQL is not a generic possessive pronoun. It is the name of Monty's eldest daughter. He has three children. My. Max. Maria. He named databases after all of them. MySQL. MaxDB. MariaDB. This is either an extremely touching way to honor your family or a significant misallocation of naming creativity. Possibly both. It is certainly a commitment to a theme that most parents would abandon after the first child.

MySQL's design philosophy was different from PostgreSQL's. PostgreSQL aimed for standards compliance and feature completeness, the database equivalent of building a cathedral. MySQL aimed for speed and simplicity, the database equivalent of building a very fast car. MySQL was faster on reads. It was simpler to install. It handled the workloads that most web applications actually needed, which turned out to be mostly "read a lot of rows quickly and sometimes write one." It did these things well, and it was free.

MySQL became the M in LAMP. Linux, Apache, MySQL, PHP. This stack, assembled from free components by developers who were building the early web without enterprise budgets, powered an astonishing fraction of the entire internet. Forums, blogs, small business sites, the early wave of social platforms that grew into the behemoths we know now. The hosting companies that served millions of small websites defaulted to MySQL. If you have ever built a website and it needed a database, there is a very good chance your first database was MySQL. If you are Par Boman, your first database was definitely MySQL, and he probably remembers the phpMyAdmin interface with a mixture of nostalgia and mild horror.

In two thousand and eight, Sun Microsystems acquired MySQL AB, the company Monty had built around his database, for approximately one billion dollars. Sun was trying to build an open source empire. Monty took the money and kept working there.

Then Oracle acquired Sun.

Oracle, the company whose licensing audits were an industry cautionary tale, whose president had told analysts that customers had no choice but to pay, the company that had bought PeopleSoft and fired five thousand people in sixty days, now owned the most popular open source database in the world. Monty Widenius, who had sold his company to Sun partly because Sun was an open-source-friendly environment, found himself effectively working for Larry Ellison.

He did not stay. Widenius left Sun on February fifth, two thousand and nine, before the Oracle acquisition was even complete. He forked MySQL. He named the fork after his youngest daughter.

MariaDB.

He also drafted a petition to the European Commission asking them to block Oracle's acquisition of Sun unless MySQL was separated out and given to an independent foundation. The petition gathered fourteen thousand one hundred and seventy-four signatures. The European Commission reviewed the situation, accepted Oracle's promises to continue developing MySQL as open source, and approved the acquisition anyway.

MariaDB is now the default MySQL-compatible database in most Linux distributions. Debian ships it. Ubuntu Server ships it. Red Hat Enterprise Linux ships it. When you install a LAMP stack on a fresh Linux server today, the database you get is likely MariaDB. Oracle acquired the most popular open source database in the world and the open source world immediately made a copy they controlled entirely.

The father named his databases after his daughters. When the biggest one was taken from him, the youngest one took its place. There is something in that story that feels like it belongs in a folk tale more than in enterprise software. But enterprise software has always been stranger than people give it credit for.

The Stage Is Set

We are now in approximately the year two thousand. Thirty years since Codd's paper. Twenty years of the commercial relational database era. SQL is everywhere. It is in Oracle, running the banks and the airlines. It is in DB2, running what remains of IBM's enterprise customer base. It is in PostgreSQL, which serious developers who care about correctness have been quietly choosing for years. It is in MySQL, powering tens of millions of websites. It is in SQLite, embedded invisibly in every phone and every browser on earth.

Every developer alive has typed SELECT star FROM something at least once and gotten back exactly what they expected. The relational model has won so completely that most people building software simply assume databases are relational, the way they assume cars have four wheels and documents have pages. It is not a choice. It is the default state of reality.

And then, quietly, two companies published papers about what they had been building internally. Google described something called Bigtable. Amazon described something called Dynamo. These papers described database-like systems that stored data without the relational model, without SQL, without schemas, without joins. They were designed for one thing above all else. Performance at planetary scale. They made explicit tradeoffs that relational databases had always refused to make. They gave up consistency guarantees. They gave up the mathematical elegance that Codd had fought for and that the industry had already compromised. They gained raw speed and horizontal scaling, the ability to spread data across thousands of machines.

The papers were public. Anyone could read them.

Does that sound familiar? It should. It is the same pattern. Just as Ellison read IBM's System R papers in the nineteen seventies and built Oracle before IBM could ship a product, now a new generation of engineers read Google's and Amazon's papers and built open source systems inspired by them. They called these systems Cassandra, CouchDB, MongoDB, Redis, and a dozen others. They called the whole category NoSQL. Not because they had replaced SQL with something better. Because they were doing something different enough that the SQL name did not apply.

The NoSQL movement came with enormous excitement. Sweeping declarations that relational databases were dead. Conference talks full of diagrams showing how everything would be faster and simpler if you just stopped trying to make data consistent. A period of several years during which a generation of developers built things in MongoDB, discovered that the relational constraints they had thrown away were actually doing useful work, and quietly, sheepishly, came back to Postgres.

And then the specialized databases appeared. Time-series databases for metrics. Graph databases for relationships. Document databases that found their niches. Vector databases that are powering the AI applications being built right now. Each one solving a problem that relational databases handle awkwardly, and each one eventually adding just enough relational features to be useful, which is either ironic or inevitable depending on your temperament.

That is part two. The NoSQL revolution, its genuine insights, its genuine failures, the CAP theorem that explains why you cannot have everything, the rise of NewSQL, the document databases that found their niches, the graph databases, the time-series databases, the vector databases, and yes, a direct accounting of what exactly Par's eighteen databases are doing and why each one exists and whether he actually needs all of them.

You now know where databases came from. You know what a primary key actually is, even if explaining it precisely would still require a moment of thought and possibly a whiteboard. You know why SQL allows NULLs even though Codd hated them. You know why Oracle charges what it charges and why you pay it anyway. You know why Monty named his database after his daughter, and why his youngest daughter's name now runs more web servers than anything Oracle touches.

The eighteen databases Par is running right now, most of them Postgres if we are being honest, every single one carries this history inside it. The mathematical precision of a British RAF navigator who went to work at IBM and saw the future in set theory. The commercial instinct of a dropout from New York who read IBM's research journals and saw money. The principled stubbornness of a Berkeley professor who built the same thing twice and improved it both times. The quiet defiance of a Swedish programmer who forked his own creation rather than let it fall into the wrong hands.

Every time Par types a query and the data comes back, all of those people are in the room. He just does not know it. Until now.

That is part one. Part two is waiting.