Your First Table: The Shape of Everything

The Spreadsheet Under the Hood

This is episode two of The Vibecoder's Guide to Postgres.

Every application you have ever used, every social media feed you have scrolled, every bank account you have checked, every flight you have booked, is secretly a spreadsheet. Not literally, of course. The reality under the hood is more sophisticated and more interesting. But the core idea, that your data lives in rows and columns, that each row is a thing and each column is a fact about that thing, is as simple as it sounds. The hard part is not understanding what a table is. The hard part is understanding why the shape of your table is a decision that will haunt you for years.

In the last episode, we talked about where Postgres came from. Berkeley, Michael Stonebraker, decades of stubborn engineering. But Postgres did not invent the idea of storing data in tables. That idea is older, and the man who had it was a British mathematician who flew patrol planes over the Atlantic during World War Two, quit IBM over McCarthyism, and spent years fighting his own employer to prove that the future of data was not a hierarchy but a flat, boring, beautiful table.

His name was Edgar Frank Codd. Everyone called him Ted. And before you write a single line of schema, you should know his story, because the decisions he made in nineteen seventy are the reason your database works the way it does today.

The Pilot Who Changed Data

Ted Codd was born in nineteen twenty-three on the Isle of Portland, a limestone peninsula jutting off the south coast of England. He was the youngest of seven children. His father manufactured leather. His mother was a schoolteacher. He won a scholarship to Exeter College at Oxford, where he started studying chemistry, but in nineteen forty-two, with the war on, he volunteered for active duty despite being eligible for a deferment.

[calm] He became a flight lieutenant in the Royal Air Force Coastal Command, flying Sunderlands. If you do not know what a Sunderland is, picture a four-engine flying boat the size of a bus, designed to hunt submarines across the open Atlantic. It was the kind of aircraft where you spent hours staring at grey water, waiting for something to happen. The experience gave Codd a lifelong love of recreational flying, but it also gave him something else. A comfort with abstraction. When you are scanning featureless ocean for hours, your mind learns to think in patterns rather than specifics.

After the war, he went back to Oxford, switched from chemistry to mathematics, finished his degree in nineteen forty-eight, and moved to the United States. He joined IBM as a mathematical programmer, working on some of the earliest machines the company built. [serious] But in nineteen fifty-three, something unexpected happened. Senator Joseph McCarthy's anti-communist witch hunts were tearing through American institutions, and Codd, the principled Englishman, was personally offended. He left IBM and moved his family to Ottawa, Canada.

He spent four years there, running the data processing department for a Canadian defense contractor. Then a chance encounter with his old IBM manager pulled him back. He rejoined in nineteen fifty-seven, moved to California, earned a doctorate from the University of Michigan in nineteen sixty-five, and settled into IBM's San Jose Research Laboratory. He was in his mid-forties when the idea hit.

The Paper That Changed Everything

In nineteen seventy, Codd published a paper in Communications of the ACM with a title that sounds almost comically dry. "A Relational Model of Data for Large Shared Data Banks." Twelve pages. No diagrams of products or marketing pitches. Just math, set theory, and a quietly revolutionary argument.

The opening line is worth hearing, because it captures the whole philosophy in a single sentence.

[serious] Future users of large data banks must be protected from having to know how the data is organized in the machine.

Think about what that means. In nineteen seventy, if you wanted to get data out of a database, you needed to know exactly how it was stored. You navigated through pointers and tree structures, following links from one record to the next like walking through a maze. The dominant model was hierarchical, IBM's own Information Management System, called IMS. Data was organized in parent-child trees, and if you wanted to find something, you had to know the path. Change the structure, and every program that touched the data broke.

Codd said no. He said the physical storage should be invisible to the user. He proposed that data be organized into relations, which is the mathematical word for tables. Each relation is a set of tuples, which is the mathematical word for rows. Each tuple has attributes, which is the mathematical word for columns. The ordering of rows is immaterial. The ordering of columns is immaterial. What matters is the logical structure, not the physical one.

This sounds obvious now because Codd won. In nineteen seventy, it was heresy.

The Company That Said No

Here is where the story gets political. IBM had billions of dollars invested in IMS. Every major corporation in America was paying IBM for hierarchical databases. Codd's paper said, essentially, that the entire approach was wrong. Not broken. Not inadequate. Conceptually wrong. You can imagine how that went over with the sales division.

IBM did what large corporations do when confronted with an inconvenient idea from their own research lab. They acknowledged it politely, and then made sure it could not threaten the existing product line. They started a project called System R to explore Codd's ideas, but they put developers in charge who were not deeply familiar with his work and they isolated Codd from the team. The System R group rejected Codd's own query language, called Alpha, and invented their own, a thing they called SEQUEL, later renamed SQL for legal reasons.

[excited] The irony is spectacular. IBM's attempt to contain Codd's revolution accidentally created the query language that would conquer the world. And because IBM moved so slowly, a programmer named Larry Ellison read the published System R papers, built his own implementation, and got Oracle to market first. IBM did not ship a relational database product until nineteen eighty-one, more than a decade after Codd's paper. By then, Oracle was already eating their lunch.

Codd won the Turing Award in nineteen eighty-one, the Nobel Prize of computer science, for his relational model. He was named an IBM Fellow in nineteen seventy-six. But his relationship with the company that had both enabled and obstructed his life's work was never simple.

What a Table Actually Is

So what did Codd actually propose? Let us get concrete, because this is the part that matters for your work.

A table has a name. It has columns, and each column has a name and a type. It has rows, and each row represents one instance of whatever the table describes. A table called users might have columns for an identifier, a name, an email address, and the date the account was created. Each row is one user.

That is the surface description. Here is the deeper one. A table is a contract. It is a declaration that says: this is the shape of this particular kind of data, these are the facts I will track about it, and these are the rules those facts must follow. When you write a table definition in Postgres, you are not just creating storage. You are making promises about your data's structure that the database will enforce on your behalf, forever, even at three in the morning when your application code has a bug and is trying to write garbage.

This is the part that vibecoders tend to miss. When you ask an AI to generate a schema, it gives you the surface. Columns with plausible names and reasonable types. What it often leaves out are the promises. The constraints that turn a loose collection of columns into a contract.

The Types That Matter

Postgres has a type system that ranges from the practical to the exotic. Let us start with the ones you will use every day.

INTEGER is a whole number. Four bytes, which means it holds values up to about two point one billion. For most things, that is enough. If you are counting users, tracking orders, or assigning identifiers, integer is your default. BIGINT doubles the storage to eight bytes and holds numbers up to about nine point two quintillion, which is useful when you are doing things at genuine internet scale.

TEXT is a string of any length. Not varchar with a limit, not char with padding, just text. In Postgres, text and varchar perform identically. The Postgres documentation is explicit about this. There is no performance difference. If you need to limit length, use a check constraint. Do not use varchar with a number in parentheses thinking it will be faster. It will not. It just adds a place where your application can fail for no good reason.

BOOLEAN is true or false. Simple. Use it for flags, toggles, states. Is this user active? Has this order shipped? Is this email verified? If you find yourself storing the words "yes" and "no" as text, stop. That is what booleans are for.

TIMESTAMPTZ, timestamp with time zone, is the one you want for dates and times. Not timestamp without time zone. Not date. Timestamptz stores the moment in UTC and converts to the client's time zone on display. If you use plain timestamp, you are storing a wall clock time with no information about which wall that clock was on. This is a mistake you will not notice until you have users in two different time zones and their timestamps disagree.

UUID is a one hundred twenty-eight bit identifier, usually represented as thirty-two hexadecimal digits with hyphens. We will come back to this one because it starts arguments.

The Exotic Shelf

Postgres also has types that no other database offers. INET stores IP addresses. INTERVAL stores durations, like three hours and forty-five minutes. Range types let you store a start and end value as a single column, perfect for scheduling, so instead of having a start time column and an end time column, you have one column that represents the whole span, and Postgres can check whether two spans overlap with a single operator.

There are array types, so a single column can hold a list of values. There is JSONB for semi-structured data, which we will cover in a later episode. There is even a MONEY type, which you should never use because it ties currency formatting to your database's locale settings, meaning your application will display different amounts depending on which server it is running on. Use numeric with two decimal places instead.

The exotic types are one of the reasons people love Postgres. In most databases, if you need to store an IP address, you store it as text and hope nobody puts the word "banana" in that column. In Postgres, you store it as inet, and the database knows what an IP address is. It can tell you if one address is inside a subnet. It can sort them numerically instead of alphabetically. The type system is not just about storage. It is about meaning.

The Constraints That Save You

Types tell Postgres what kind of data a column holds. Constraints tell it what the data is allowed to be. This is where the contract gets teeth.

NOT NULL means the column must have a value. It cannot be blank. This sounds trivial until you realize how many real-world bugs come from missing data. A user with no email address. An order with no total. A timestamp that never got set. Every column that should always have a value needs NOT NULL, and the default in Postgres is to allow nulls. You have to opt in to requiring data. If you do nothing, every column is optional.

UNIQUE means no two rows can have the same value in this column. Email addresses, usernames, API keys. Anything that identifies a thing should be unique. Without this constraint, you will eventually have two users with the same email, and your login system will pick one of them at random.

CHECK lets you write arbitrary rules. A price must be greater than zero. A rating must be between one and five. A status must be one of a specific set of values. Check constraints are the place where business logic meets the database, and they are the thing AI-generated schemas almost never include.

DEFAULT gives a column a value when one is not provided. The current timestamp for a created at column. The boolean false for an is active flag. Zero for a counter. Defaults reduce the burden on your application code and guarantee consistency.

And then there are foreign keys, which connect tables to each other, but that is episode four's story.

The Primary Key Debate

Every table needs a primary key, a column or combination of columns that uniquely identifies each row. This is the single most important decision in your table design, and it is also the one where people argue the most.

The old way, and the way AI tools still generate by default most of the time, is SERIAL. Serial is not actually a type. It is syntactic sugar. When you write serial, Postgres creates a separate sequence object, attaches it to the column, and auto-increments it for each new row. One, two, three, four, five. Simple. But serial has problems. The sequence it creates is a separate object with separate permissions. If you dump and restore your database, the sequence might get out of sync. If you use an ORM, the ORM might not understand the relationship between the column and its sequence. Serial was fine in the nineties. It is a legacy choice now.

The modern replacement is GENERATED ALWAYS AS IDENTITY, which arrived in Postgres version ten back in twenty seventeen. It does the same thing, auto-incrementing integers, but the sequence is properly tied to the column as an internal dependency. Permissions are simpler. Dumps are cleaner. The SQL standard blesses it. If you are starting a new project today, there is no reason to use serial. Use identity.

But wait. Should you be using auto-incrementing integers at all? This is where the debate gets heated.

[serious] I have seen so many projects switch to UUIDs and then wonder why their database is twice the size and their inserts are thirty percent slower. Sequential integers are fine. They are more than fine. They are optimal for ninety-five percent of use cases.

We use UUIDs everywhere and we would never go back. When you have microservices that each generate their own identifiers, when you need to merge data from multiple sources, when you want to expose identifiers in an API without leaking how many records you have, integers just do not work.

Both sides have a point. UUIDs are one hundred twenty-eight bits versus thirty-two for a regular integer. That means bigger indexes, more memory usage, and because version four UUIDs are random, they scatter inserts across the B-tree index instead of appending to the end, which causes more page splits and more disk writes. For a table with a million rows, the difference is negligible. For a table with a billion rows, it is very real.

The compromise that is gaining traction is UUID version seven, which is time-ordered. The first bits encode the timestamp, so new values sort after old ones, which means sequential inserts and smaller, more efficient indexes. Postgres added native UUID version seven support recently, and it is becoming the recommended choice for projects that need distributed identifiers without the performance penalty.

For your first table, as a vibecoder? Start with identity columns. They are simple, fast, and correct. Switch to UUIDs when you have a concrete reason to, not because an AI told you to.

Rabbit Hole: Codd's Twelve Rules That Are Actually Thirteen

This next section goes deep into database theory history. If you just want practical schema advice, skip ahead to the next chapter. But if you are curious about the most famous act of spite in database history, stay with me.

By the mid-nineteen eighties, the relational model was winning. Vendors saw the market shifting and wanted in. But many of them did not actually build relational databases. They took their old hierarchical or network systems, bolted on a SQL query parser, and called the result "relational." It was the database equivalent of putting a racing stripe on a minivan. [laugh]

[angry] Codd was furious. He had spent fifteen years fighting for the relational model, and now companies were diluting the term until it meant nothing. So in October nineteen eighty-five, he published two articles in Computerworld magazine. "Is Your DBMS Really Relational?" and "Does Your DBMS Run By the Rules?" In them, he laid out twelve rules for what a truly relational database must do.

The catch, and this is the part that makes computer scientists smile, is that there are thirteen rules. Codd numbered them zero through twelve. Rule zero is the foundation rule, stating that any system claiming to be relational must manage data entirely through its relational capabilities. The other twelve get increasingly specific, covering everything from guaranteed access through primary keys, to how null values must be handled, to the requirement that the database's own metadata must be queryable as normal tables.

No database has ever fully satisfied all thirteen rules. Not Oracle. Not SQL Server. Not Postgres. Codd set the bar deliberately high, as a weapon against vendors claiming compliance they had not earned. The rules were not a checklist. They were a standard that exposed pretenders. And they worked. The vendors who were faking it quietly retreated, and the vendors who were serious started building real relational systems.

Codd left IBM shortly after publishing the rules. The campaign had made his position there uncomfortable. He formed a consulting company with Chris Date and others, and spent the rest of his career as the conscience of the relational world. [sad] He died in two thousand three, at his home in Florida, at the age of seventy-nine.

What the AI Gives You

Let us bring this back to vibe coding. You sit down, open your AI assistant, and type something like "create a Postgres schema for a blog with users, posts, and comments." Within seconds, you get a complete schema. It has tables. It has columns. It has data types. It might even have foreign keys. And ninety percent of it is correct.

Here is what it typically gets right. The table names are reasonable. The column names follow conventions. The basic types are appropriate. Text for titles, timestamps for dates, integers for identifiers. The foreign keys usually point in the right direction.

Here is what it typically gets wrong, and this is the part that matters.

First, NOT NULL. The AI almost never adds NOT NULL constraints except on the primary key. It treats every other column as optional, because that is the path of least resistance when generating code that needs to compile. In practice, a blog post without a title, a user without an email, a comment without a body, these are all errors your application should never allow. Every column that must have a value should say NOT NULL, and the AI leaves it to your application code to enforce what the database should be handling.

Second, check constraints are almost entirely absent. An AI will give you an integer column called rating but will not add a check that the rating must be between one and five. It will give you a text column called status but will not constrain it to the three or four valid values. Without check constraints, your database is a bucket that accepts anything. With them, it is a contract that rejects garbage.

Third, the types are often lazily broad. A column that should be a boolean is sometimes generated as text. A column that should be an enum or a constrained set is generated as a bare varchar. The AI does not think about what the data means. It thinks about what compiles, and text always compiles.

Fourth, defaults are rare. A created at column without a default of the current timestamp means your application has to remember to set it every time. It will forget.

[calm] The schema an AI generates is like a rough draft from a junior developer. The structure is there, the intent is clear, but the discipline is missing. The constraints, the defaults, the check conditions. Those are what separate a schema that works from a schema that survives.

The Schema as a Contract

Here is the mental model that will serve you well. A schema is not a description of your data. It is a contract about your data. Every column type is a promise about what kind of value goes there. Every NOT NULL is a promise that the value will exist. Every check constraint is a promise about what values are valid. Every default is a promise about what happens when no value is provided. Every unique constraint is a promise that no duplicates will exist.

The database enforces these promises at the storage level, below your application code, below your API, below your ORM. It does not matter if your Python code has a bug. It does not matter if someone connects directly to the database with a SQL client. It does not matter if a future developer rewrites the backend in a different language. The contract holds.

This is Codd's original insight, translated into practice. The user of the data should not need to know how it is stored. But more than that, the data should protect itself. [slow] A well-designed schema is a database that says no to bad data even when the application says yes.

On the popcorn server, there are eight Postgres databases. The parkit database powers a suite of productivity tools. Capture, Focus, and Time. Multiple services, each with their own tables, sharing one database with real foreign keys connecting them. When Capture creates a task, it references a user that actually exists. When Focus marks something complete, it references a task that actually exists. The relational model, working as designed, across service boundaries, in production, for one developer who built it all with AI assistance.

That is the difference between a schema someone typed and a schema someone designed. The typed version stores data. The designed version protects it.

The Bridge

You have tables now. You understand that columns have types and that constraints are the rules your data must follow. You know that a primary key identifies each row and that the choice between integers and UUIDs is a real decision with real tradeoffs. You have met Ted Codd, the RAF pilot who saw the future in a set of mathematical relations and spent decades fighting to make it real.

But a table sitting alone is just a spreadsheet. The power of a relational database comes from the word relational, from connecting tables to each other, from foreign keys and joins and normalization. Before we get there, though, we need to learn the language. Not Codd's Alpha, not IBM's SEQUEL, but the thing it became. SQL. The weirdest programming language you will ever love.

That is next time, in episode three.