This is episode three of The Vibecoder's Guide to Postgres.
Every year, in the middle of summer, a man named Don Chamberlin goes camping. He has been doing this for fifty years. He goes with the same person every time, a woman named Kristin, and if you asked Chamberlin why, he would tell you about her father. Ray Boyce. His best friend. The man he carpooled with to work at IBM, the man he bought a house next to, the man who helped him invent the most widely used programming language in the history of computing. [sad] Ray Boyce died on Father's Day, nineteen seventy-four, from a brain aneurysm that struck during lunch. [slow] He was twenty-seven years old. His daughter Kristin was nine months old.
Chamberlin would later say it simply.
[sad] Ray was my best friend. We moved to California together. We bought houses together and were carpooling to work.
The language they built together is SQL. You use it every time you talk to Postgres, every time your AI generates a database query, every time anyone anywhere on earth asks a computer for data. It is spoken billions of times a day by machines, and it started with two friends playing a game.
In the last episode, we built our first table and met Ted Codd, the RAF pilot who proved that data belongs in rows and columns. Today we are going to learn how to talk to those tables. And the story of how that language came to be is inseparable from the story of Raymond Boyce, a man who died before he could see what he had helped create.
In nineteen seventy-two, Don Chamberlin and Ray Boyce were working at IBM's T.J. Watson Research Center in Yorktown Heights, New York. They attended a symposium where Ted Codd presented his relational model, the same paper we discussed in episode two. Chamberlin was impressed by some parts and frustrated by others.
[excited] Ray and I liked some parts of Codd's ideas better than others. We really liked this idea of a non procedural query with the slogan, tell me what you want, not how to find it. What we didn't like was the mathematical jargon in Ted's papers.
This is the key insight, and it is the one that makes SQL different from every other programming language you have ever encountered. Most languages are imperative. You tell the computer what to do, step by step. Open this file. Read line by line. If the line contains this word, add it to a list. Sort the list. Return the first ten items. You are writing instructions. A recipe.
Codd's relational algebra was the opposite. It was declarative. You described the result you wanted and the system figured out how to get it. But Codd expressed this in mathematical notation, set theory, predicate logic, subscripts and superscripts. It was elegant and precise and completely impractical for anyone who was not a mathematician.
Chamberlin and Boyce thought they could do better. They started playing what they called "the query game." They would take a question, something like "find all employees in the accounting department who earn more than fifty thousand," and then challenge each other to express it in a syntax that a normal person could read. Not a computer scientist. A normal person. An engineer. An architect. A city planner.
We wanted the language to look as closely as possible to natural language so they could read it and understand it, like it was an English sentence.
Their first attempt was called SQUARE, which stood for Specifying Queries As Relational Expressions. It used subscript and superscript notation, which looked fine on paper but was impossible to type on any keyboard that actually existed. So they threw it away and tried again.
The second attempt worked. They called it SEQUEL, which stood for Structured English Query Language. The idea was simple. Instead of mathematical symbols, use English words. Instead of subscripts, use clauses. Instead of asking you to understand set theory, ask you to write something that reads almost like a sentence. "Select the name and salary from the employees table where the department equals accounting." It is not perfect English, but it is close enough that you can read it out loud and understand what it means.
In nineteen seventy-three, both Chamberlin and Boyce moved to IBM's San Jose Research Laboratory to join the System R team, the project that was building a working implementation of Codd's relational ideas. They published their SEQUEL paper in nineteen seventy-four, presenting it at a conference in Ann Arbor, Michigan. Boyce also submitted a paper on data definition to a conference in Stockholm. It was rejected.
[serious] Shortly after the paper was presented, Ray Boyce was at lunch when he collapsed. A brain aneurysm. He was rushed to Valley Medical Center in Santa Clara. He did not survive the surgery.
[slow] He was twenty-seven. He had been at the San Jose lab for less than two years. His daughter Kristin was nine months old. It was Father's Day. <break time="1s"/>
The language Chamberlin and Boyce created was called SEQUEL. It was a good name. It sounded like a word. It was easy to say. It told you exactly what it was, a structured way to query data in English. There was just one problem.
SEQUEL was already a trademark. It belonged to Hawker Siddeley Dynamics Engineering Limited, a British aerospace and defense company. They made aircraft. They had absolutely nothing to do with databases. But trademark law does not care about relevance. It cares about names, and the name was taken.
So Chamberlin did what you do when a lawyer tells you to change your product's name.
[laugh] I took all the vowels out of it and turned it into SQL.
And that is why, fifty years later, there is an eternal pronunciation debate. People who know the history say "sequel," because that was the original name. People who follow the standard say "S Q L," because that is what the letters are. The official ANSI standard says "S Q L." Most developers say "sequel." Postgres's own documentation uses both. Nobody has ever won this argument, and nobody ever will.
Let us pause the history for a moment, because the most important idea in SQL is the one Chamberlin and Boyce borrowed from Codd. Declarative programming. And it is the reason that AI is better at writing SQL than almost any other kind of code.
Here is what declarative means, in plain terms. When you write Python or JavaScript or any conventional programming language, you write instructions. Step one, step two, step three. You are telling the computer how to do the work. This is called imperative programming, and it is how most code works. Open the file. Loop through the lines. Filter the results. Sort them. Return the first ten.
SQL flips this upside down. In SQL, you do not tell the database how to find your data. You tell it what you want. Give me all users who signed up last month, grouped by country, sorted by count, largest first. That is it. You describe the shape of the answer, and the database figures out the most efficient way to produce it. Which table to scan. Which index to use. Whether to sort in memory or on disk. You do not specify any of that. The database decides.
This is simultaneously the weirdest and most powerful thing about SQL. In every other language, you are the architect of the solution. In SQL, you are the architect of the question. The database is the architect of the solution. And the database, which has spent decades being optimized for exactly this job, is almost always better at it than you would be.
Think of it like ordering food at a restaurant. Imperative is walking into the kitchen and saying "take a pan, heat it to medium, add two tablespoons of oil, wait thirty seconds, crack two eggs into the pan, cook for three minutes." Declarative is sitting at the table and saying "I would like two eggs, sunny side up." You describe the result. The kitchen figures out the process. And if the kitchen gets a better stove next month, your eggs get cooked faster without you changing your order.
This is exactly why SQL has survived for fifty years while hundreds of other languages have come and gone. The questions stay the same even as the answers get faster. A query written in nineteen eighty-five asks for the same data in the same way today, but the database engine answering it is a thousand times more sophisticated.
SQL has hundreds of keywords, but the vast majority of what you will ever do comes down to four verbs. SELECT, INSERT, UPDATE, DELETE. Everything else is qualification, refinement, decoration on these four fundamental operations.
SELECT is the one you will use the most. It retrieves data. SELECT name, email FROM users. Give me the name and email from the users table. Add WHERE active equals true and you get only the active users. Add ORDER BY name and they come back alphabetically. Add LIMIT ten and you get the first ten.
INSERT adds new rows. INSERT INTO users, then the values. A new user appears in the table. The constraints you set up in episode two, the NOT NULL and the CHECK and the UNIQUE, they all fire at this moment. If your new row violates any of them, the database rejects it. This is your contract enforcing itself.
UPDATE changes existing rows. UPDATE users SET email equals a new value WHERE id equals some identifier. The WHERE clause is critical here, and this is something vibecoders need to understand viscerally. If you forget the WHERE clause on an UPDATE, you update every row in the table. Every single one. Every user's email just got changed to the same value. This is not a hypothetical. This happens in production, regularly, to experienced developers.
DELETE removes rows. DELETE FROM users WHERE last login is more than two years ago. Same warning as UPDATE. Forget the WHERE clause and you delete every row. The table still exists, but it is empty. Your data is gone. This is why database people develop a reflexive habit of writing the WHERE clause first, before the rest of the statement, even though that is not the order the language expects.
Those are the four verbs. Select to read. Insert to create. Update to modify. Delete to remove. If you understand these four operations and the WHERE clause that focuses them, you understand seventy percent of what SQL does.
The four verbs get you data in and out. The next layer shapes what comes back.
WHERE filters rows before anything else happens. Give me orders where the total is greater than one hundred. Give me users where the country equals Sweden. Give me products where the name contains the word "wireless." WHERE is the sieve that your data passes through.
ORDER BY sorts the result. By name alphabetically. By date with the newest first. By price lowest to highest. Without ORDER BY, SQL makes no guarantees about the order of your results. They might come back in the order they were inserted. They might not. The database is free to return rows in whatever order is most efficient, and relying on an accidental ordering is a bug waiting to happen.
GROUP BY is where things get interesting. It collapses rows that share a value into a single row, and then you can run aggregate functions on each group. How many users signed up from each country? Group by country, count the rows in each group. What is the average order total per month? Group by month, average the totals. GROUP BY transforms raw data into summaries, and summaries are usually what you actually want.
HAVING is the weird one. It filters groups the same way WHERE filters rows. Why do you need a separate keyword? Because WHERE runs before the grouping happens, and HAVING runs after. If you want countries with more than a hundred users, you cannot put that in the WHERE clause because the count does not exist yet. The database has not grouped anything yet. You have to wait until after the grouping, and that is what HAVING does.
This brings us to one of SQL's most famous quirks, and the reason this chapter exists.
This next section goes deep into why SQL's grammar is backwards. If you just want to know how to use SQL effectively, skip ahead to the next chapter. But if you have ever wondered why the language feels slightly wrong, like the sentences are assembled out of order, this is why.
When you write a SQL query, you write SELECT first. Then FROM. Then WHERE. Then GROUP BY. Then HAVING. Then ORDER BY. This is the order the language asks you to type.
But this is not the order the database executes it. The database reads FROM first. It starts with the table. Then it applies WHERE to filter rows. Then GROUP BY to collapse groups. Then HAVING to filter groups. Then, only then, does it run SELECT to pick out the columns you asked for. And finally ORDER BY sorts the result.
You are writing the conclusion before the premise. Imagine writing a sentence where you have to state the answer before you state the question. "The answer is forty-two, by the way the question was about the meaning of life." That is SQL's grammar.
This is not an accident. It is a deliberate design choice by Chamberlin and Boyce, and it comes directly from their goal of making the language sound like English. "Select the names from the employees table where the department is accounting." That reads like a natural English sentence. "From the employees table, where the department is accounting, select the names." That is more logical but less natural.
Codd's own language, Alpha, and the QUEL language used in INGRES and later in the original POSTGRES, both put the table specification first. They were more logical. They were also less popular. SQL won because it sounded like English, even though the price was a grammar that confuses everyone.
The practical consequence is real. If you start typing a SELECT statement, your code editor cannot help you. It cannot autocomplete column names because you have not told it which table you are querying yet. That comes later, in the FROM clause. This is why modern alternatives like PRQL and LINQ write the table name first, because they prioritize logic over English readability. But SQL's order is locked in by fifty years of standard documents and billions of lines of existing code. It is never changing.
Chamberlin himself acknowledged the tradeoff.
A little bit more care in making the language more orthogonal would have been beneficial in the long run.
Here is the part of the episode that might surprise you. SQL is the single best use case for AI-assisted coding. Better than Python. Better than JavaScript. Better than any other language. And the reason is precisely because SQL is declarative.
Remember what declarative means. You describe what you want, not how to get it. Now think about what you do when you talk to an AI. You describe what you want. "Get me all users who signed up last month, grouped by country, sorted by count." That is both a natural language request and a nearly complete SQL query. The translation from human intent to SQL is shorter than the translation from human intent to any imperative language, because SQL was designed from the beginning to sound like English.
This is why AI gets SQL right at an astonishing rate. Simple queries, the kind that make up the vast majority of real application code, are generated correctly almost every time. Select with a where clause. Insert with values. Update with a condition. Join two tables on a foreign key. Group by a column and count. These are the bread and butter of database work, and AI nails them because the distance between what you ask for and what the code needs to say is almost zero.
Studies back this up. When researchers test AI models on real-world SQL tasks, the success rate on simple queries is consistently above fifty percent even for general-purpose models, and specialized approaches push well past eighty percent. For comparison, human performance on the same benchmarks exceeds ninety-two percent. But the AI's errors are not random. They cluster in predictable places, and if you know where to look, you can catch them.
SQL is the one area where I actually trust the AI's first draft. I still review it, but ninety percent of the time, the query does exactly what I asked for. The other ten percent is where it gets creative, and creative is not what you want from a database query.
So where does AI-generated SQL break down? The patterns are consistent enough that you can build a mental checklist.
First, hallucinated column names. The AI generates a query that references a column called product name or revenue or region, and those columns do not exist in your table. The table has product id, units sold, and sale date. The AI invented columns that sound right but are not real. This is the most common failure mode, and it is the easiest to catch because the database will throw an error the moment you run it.
Second, wrong joins. When a query needs to combine data from multiple tables, which we will cover properly in episode four, the AI sometimes joins on the wrong columns or forgets a join condition entirely. This does not always throw an error. Sometimes it silently produces a result set that is wildly wrong, a cartesian product where every row in one table is matched with every row in the other, giving you millions of results instead of hundreds.
Third, aggregation mistakes. The AI writes a GROUP BY but forgets to include a column in it, or uses the wrong aggregate function, or puts a filter in WHERE that should be in HAVING. These are the errors that produce results that look plausible but are subtly incorrect. You get a number that seems reasonable, and only if you check it carefully do you realize it is counting the wrong thing.
Fourth, Postgres-specific syntax. This is the trap that vibecoders fall into most often. AI models are trained on SQL from every database dialect, MySQL, SQL Server, Oracle, SQLite. When you ask for Postgres SQL, you sometimes get MySQL syntax with Postgres keywords sprinkled on top. It might use a function that exists in MySQL but not in Postgres. It might use the wrong quoting style. It might generate a perfectly valid query for SQL Server that fails silently in Postgres because the behavior is different.
Simon Willison, one of the sharpest voices on AI-assisted development, has an important observation about this.
Hallucinations in code are the least harmful hallucinations you can encounter. The moment you run LLM generated code, any hallucinated methods will be instantly obvious. You will get an error.
He is right, and he is wrong. He is right that syntax errors are caught immediately. Your query either runs or it does not. But he is also right about the deeper danger.
[serious] If you are using an LLM to write code without even running it yourself, what are you doing?
The real risk is not the queries that fail. It is the queries that succeed but return the wrong answer. A GROUP BY that counts duplicates. A WHERE clause that excludes rows you did not mean to exclude. A join that silently drops data. These queries run fine. They return results. The results are wrong. And unless you understand enough SQL to look at the query and ask "wait, is that actually what I meant?", you will never know.
This is the core argument of this whole series. You do not need to write SQL from scratch. AI handles that. But you need to read SQL well enough to spot when the AI is confidently wrong. You need to know that WHERE filters before grouping and HAVING filters after. You need to know that a missing join condition turns a hundred rows into a million. You need to know that SELECT comes first in the syntax but last in the execution.
There is one more thing Ray Boyce left behind. In nineteen seventy-four, the same year he died, he and Ted Codd published work on a concept called Boyce-Codd Normal Form. It is a rule about how to structure tables to avoid certain kinds of data anomalies, and it is still taught in every database textbook in the world. If you have taken any computer science course that touches databases, you have encountered Boyce-Codd Normal Form. His name is in every syllabus, on every exam, in every textbook, forever attached to the field he helped create.
[sad] [slow] He never saw SQL become a standard. He never saw it adopted by Oracle in nineteen seventy-nine, beating IBM to market with their own employee's invention. He never saw the ANSI standard in nineteen eighty-six. He never saw the federal mandate that forced every government software vendor to support it. He never saw his daughter grow up.
Chamberlin kept going. He continued working on SQL at IBM for decades, evolving it through multiple standard revisions. Their collaboration had been so close that Chamberlin once said neither of them could remember which ideas each had contributed. They played the query game, challenging each other to express database queries in syntax that ordinary people could read, and out of that game came the language that the entire world now speaks to its data.
[sad] I am still in touch with Ray's daughter, who was one year old when he died and she is fifty years old now. We still go camping with her every year in the summer.
<break time="1s"/> Fifty years of camping trips. That is the human side of the language your database speaks.
You can talk to your database now. You know the four verbs, SELECT, INSERT, UPDATE, DELETE. You know that WHERE filters rows and ORDER BY sorts them. You know that GROUP BY collapses data into summaries and HAVING filters those summaries. You know that SQL is declarative, that you describe the question and the database architects the answer, and you know that this is exactly why AI is so good at writing it.
You also know the limits. The ten percent where AI gets creative. The hallucinated columns. The wrong joins. The subtle aggregation errors that return plausible wrong answers. Understanding SQL's logic, even if you never write it by hand, is what lets you catch those mistakes before they reach your users.
But here is what we have not covered yet. So far, everything we have done involves one table at a time. One table of users. One table of orders. One table of products. In the real world, data does not live in one table. It lives in many tables, connected to each other by relationships. A user has orders. An order has products. A product has a category. The power of a relational database comes from those connections, from foreign keys and joins, from the word "relational" itself.
That is episode four. It is all related.