JSON and the Schemaless Temptation: When Flexibility Becomes a Trap

The Properties Column

This is episode eight of The Vibecoder's Guide to Postgres.

A developer at a growing startup had a problem. Their users wanted custom fields. Different customers needed different attributes on their profiles, different metadata on their orders, different tags on their content. The relational model felt rigid. Every new attribute meant a migration, a new column, a deploy. The developer had heard about MongoDB, the document database where you just throw JSON in and it figures it out. But the rest of the stack was Postgres, and switching databases for one feature felt like overkill.

Then the developer discovered something. Postgres had a column type called JSONB. You could store any JSON object in it, query into it, even index it. One column, infinite flexibility. No migrations needed. Just add a properties column to every table and put whatever you want in there.

[excited] It felt like a cheat code. And for about six months, it was. The prototype worked beautifully. The custom fields shipped in a week instead of a month. The AI assistant loved it too, generating queries that reached into the JSON with arrow operators, pulling out nested values, filtering by arbitrary keys. Everything was flexible. Everything was fast enough. Everything was fine.

[serious] Then the table hit five million rows. And the queries that used to take fifty milliseconds started taking eleven seconds. The dashboard went from snappy to unusable. The developer ran EXPLAIN ANALYZE and found something disturbing. Postgres was doing sequential scans on the JSONB column, grinding through every row, because the query planner had no idea what was inside those JSON blobs. It could not estimate how many rows matched. It could not pick the right join strategy. It was guessing, and it was guessing wrong.

The developer had built a document database inside a relational database. And the relational database was not happy about it.

This is the story of JSONB. The most seductive feature in Postgres. The one your AI assistant reaches for first. The one that solves real problems brilliantly when used correctly, and creates invisible disasters when used as a substitute for actual schema design. The temptation is real. The trap is realer.

A Decade of Semi-Structured Ambition

To understand JSONB, you need to understand that it did not appear out of nowhere. It was the culmination of over a decade of work by two Russian researchers who believed that Postgres could handle more than just rows and columns.

Oleg Bartunov joined the PostgreSQL project in nineteen ninety-six. His colleague Teodor Sigaev started contributing in two thousand. Working from Moscow, they built a series of extensions that pushed Postgres beyond the traditional relational model. First came intarray, for handling arrays of integers. Then tsearch, which became Postgres's full-text search engine. Then ltree, for hierarchical label trees. Then hstore, a key-value store inside a single column, which shipped with Postgres eight point two in two thousand six.

Each of these was a step toward the same goal. Real-world data is messy. Not everything fits neatly into predefined columns. Sometimes you have metadata that varies per record. Sometimes you have nested structures that change shape over time. The relational model is powerful, but it assumes you know the shape of your data before you store it. Bartunov and Sigaev wanted to give Postgres a way to handle data whose shape you might not know yet.

Hstore was the proof of concept. A flat key-value store, no nesting, text values only. But it had binary storage, which made it fast. And it had GIN indexing, which made it searchable. By two thousand twelve, hstore was popular enough that the team started working on a nested version, something that could handle the kind of deeply structured documents that MongoDB and CouchDB were attracting developers with.

The original plan was to extend hstore to support nesting and arrays. But the PostgreSQL community had already added a JSON data type in version nine point two, released in two thousand twelve. That JSON type was simple. It stored your JSON as text, validated the syntax on input, and gave it back exactly as you entered it. Whitespace, key order, duplicate keys, all preserved. It was essentially a fancy text column that happened to reject invalid JSON.

This was useful but limited. Every time you queried into the JSON, Postgres had to parse the text from scratch. No binary searching. No efficient indexing. No way for the query planner to understand what was inside. The JSON type was a storage format, not a query format.

So Bartunov and Sigaev pivoted. Instead of extending hstore, they would build something new. A binary JSON representation that combined hstore's efficient storage and indexing with JSON's universal format. They called it JSONB. The B stands for binary.

On March twenty-fourth, two thousand fourteen, Andrew Dunstan committed JSONB to the PostgreSQL nine point four branch. Peter Geoghegan, who started as a reviewer and ended up spending weeks polishing the code, was credited as co-author. Alexander Korotkov contributed crucial improvements to the GIN indexing infrastructure that made JSONB queries fast. It was a team effort built on a decade of foundation work.

Bartunov was characteristically direct about what this meant for the database landscape.

[calm] Ninety-nine point nine percent of projects do not need MongoDB and could be nicely implemented in the relational model. Now, PostgreSQL has introduced real support for semi-structured data.

[laugh] That was a bold claim. But twelve years later, it has aged remarkably well.

The Two Faces of JSON

Here is the practical difference between JSON and JSONB, and it matters more than you might think.

When you store data in a JSON column, Postgres keeps the exact text you gave it. Every space, every duplicate key, the precise order of your keys. You get back exactly what you put in. But every time you query that column, Postgres has to parse the raw text from scratch. For a small document, this takes microseconds. For millions of rows with deeply nested structures, it becomes the bottleneck.

JSONB does something fundamentally different. On insert, Postgres converts your JSON into a decomposed binary format. This takes slightly longer, a few extra microseconds. But in exchange, you get a structure Postgres can navigate without parsing. Keys are deduplicated. Whitespace is stripped. The internal representation is typed, so Postgres knows the difference between a number, a string, a boolean, and null without inferring it from text every time.

The practical impact is enormous. JSONB supports GIN indexing, making containment queries into index scans instead of sequential scans. It supports efficient path queries, reaching into nested structures without rebuilding the entire document in memory.

The tradeoff is that JSONB does not preserve formatting. Key order might change. Duplicate keys get collapsed. If you need exact text preservation, use JSON. For everything else, use JSONB. This is not close to a debate.

When JSONB Shines

JSONB is not a mistake. It is not a trap. It is a genuinely powerful feature that solves real problems. The trap is using it for the wrong problems. So let us start with where it is exactly right.

User preferences. Your users have settings, notification preferences, dashboard configurations, theme choices. Some users set twelve preferences, some set two. The set of possible preferences changes as your product evolves. You do not want to add a new column every time you introduce a toggle. A JSONB column called preferences on the users table is perfect. You write the whole blob on save, you read the whole blob on load, and you almost never need to query across users by a specific preference value.

Event logs and audit trails. When you are recording what happened, the shape of the event varies by type. A login event has an IP address and a user agent. A purchase event has line items and a payment method. A notification event has a channel and a template. You could normalize this into a dozen tables with foreign keys everywhere. Or you could have an events table with a timestamp, an event type, a user reference, and a JSONB payload column. The payload varies by type, and that is fine because you almost always query events by user and time range, not by the contents of the payload.

API response caching. You fetch data from a third-party API and want to store the response. The response shape is defined by someone else's API, not your schema. It might change without warning. Storing it as JSONB means you can cache it, query into it if needed, and never worry about schema mismatches.

And prototyping. This is the one the vibecoder knows best. You are building something new. You do not know the final shape of the data. JSONB lets you move fast, store whatever structure makes sense right now, and figure out the proper schema later. This is legitimate. This is good engineering. The problem, and we will get to this, is the "later" part.

When JSONB Hurts

Now the uncomfortable part. The part where everything looks fine until it does not.

In two thousand sixteen, Dan Robinson at Heap published a blog post called "When to Avoid JSONB in a PostgreSQL Schema." Heap is an analytics company. They process enormous volumes of event data. They rely on JSONB heavily because their customers attach arbitrary properties to events. The exact use case JSONB was designed for. And even they found that JSONB was creating serious performance problems.

Here is the core issue. When you have a regular column, Postgres maintains statistics about it. It knows that forty percent of status values are "active," thirty percent are "pending," and uses those statistics to pick efficient execution plans.

JSONB columns have no statistics. None. Postgres cannot look inside a JSONB blob and track value distributions. When you filter by a value inside JSONB, the planner falls back to a hardcoded estimate. It assumes the filter will match zero point one percent of rows. Not because it calculated this. Because that is the default guess baked into the source code.

[surprised] For Heap, this meant that a query which took three hundred milliseconds with proper columns took five hundred and eighty-four seconds with JSONB. That is not a percentage difference. That is nearly two thousand times slower. The planner was choosing nested loop joins because it thought the JSONB filter would return a tiny number of rows, when in reality it was returning thousands.

[serious] The planner is not stupid. It is uninformed. When you hide your data inside JSONB, you are blindfolding the smartest part of Postgres and then complaining that it bumps into furniture.

There is also the storage problem. JSONB does not deduplicate key names across rows. If you have a million rows and every one contains the keys "first name," "last name," "email," and "created at," those key strings are stored a million times each. Robinson found that the JSONB version of a table used a hundred and sixty-four megabytes of disk space compared to seventy-nine megabytes for the same data in proper columns. More than double.

And then there are updates. If you need to change one key inside a JSONB blob, Postgres has to read the entire blob, modify it in memory, and write the entire blob back. For a small preferences object, this is trivial. For a large document with dozens of keys, it means every minor update rewrites potentially kilobytes of data. With regular columns, updating a single field touches only that field.

The summary is blunt. If you are querying by a field regularly, it should probably be a column. If you are filtering, sorting, joining, or aggregating by a value, it should definitely be a column. JSONB is for the data that does not fit in your schema, not for the data that you were too busy to put in your schema.

The Guardian Migrates

Here is the other side of the story. The side where JSONB is the hero.

In two thousand eighteen, The Guardian newspaper migrated their content management system from MongoDB to PostgreSQL. Their CMS, called Composer, stored articles, blog content, photo galleries, and video. It had been built on a self-managed MongoDB cluster, and the team had grown tired of operational headaches. Clock synchronization failures. Difficulty getting timely vendor support. Basic system administration problems that kept pulling developers away from building features.

They chose PostgreSQL on Amazon RDS. And the key feature that made the migration possible was JSONB. Their content documents had flexible structures. An article had different fields from a photo gallery. A live blog had different metadata from a video embed. In MongoDB, this flexibility was the default. In PostgreSQL, JSONB provided the same flexibility within a managed, ACID-compliant relational database.

We chose PostgreSQL because of its maturity and its support for the JSONB data type, which allows indexing of fields within JSON objects.

The migration was done through an API layer with zero downtime, using a proxy to compare responses between the two systems before cutting over.

This is JSONB at its best. Genuinely flexible content that varies by type, stored inside a relational database that provides transactions, constraints, and joins. The Guardian did not put everything in JSONB. They used proper columns for structured data they queried regularly, things like publication date, author, section, and status. JSONB held the variable content body and type-specific metadata.

That distinction is the entire lesson of this episode.

Querying and Indexing the Blob

If you are going to use JSONB, you need to know how to query it efficiently. And that means understanding two things. The operators and the indexes.

Postgres gives you a family of operators for reaching into JSONB. The single arrow extracts a value as JSONB. The double arrow extracts it as text, a plain string you can compare, sort, and filter against. For nested structures, you chain them. The hash-arrow and hash-double-arrow operators let you specify a path as an array of keys, reaching deep without chaining multiple arrows. And the containment operator, the at-sign followed by a greater-than sign, tests whether one JSONB document contains another. This is where things get interesting for indexing.

A GIN index on a JSONB column makes containment queries fast. GIN stands for Generalized Inverted Index. Instead of mapping rows to values like a B-tree, a GIN index maps values to rows. Think of it like the index at the back of a textbook. You look up a term and it tells you which pages mention it. A GIN index on JSONB breaks apart every document, extracts every key and value, and builds a lookup that says "the value active for the key status appears in rows four, seventeen, twenty-three, and nine thousand eight hundred and twelve."

[excited] This means the containment operator can be answered by the index without touching the table at all. If you ask "give me all rows where the data column contains status equals active," the GIN index can resolve that directly. Fast. Efficient. Exactly what you want.

But there is a catch. The double arrow operator, the one that extracts text values, does not use the GIN index. If you write a WHERE clause that says data double arrow status equals active, the GIN index sits idle. Postgres does a sequential scan instead. To use the index, you need the containment operator. Same logical query, different syntax, dramatically different performance.

Postgres offers two GIN operator classes for JSONB. The default, called jsonb ops, supports the containment operator, the existence operator, and a few others. It indexes every key and value independently, which makes it versatile but larger. The alternative, called jsonb path ops, only supports the containment operator, but the index is significantly smaller and faster for that specific operation. If you know you will only use containment queries, use jsonb path ops.

The takeaway for vibecoders. If your AI generates a JSONB query using the double arrow operator in a WHERE clause, and the table is large, that query will be slow no matter what indexes you have. Rewrite it to use the containment operator. This is one of those places where understanding the syntax matters more than understanding the concept.

Rabbit Hole: The Other Flexible Types

This next section goes into some of Postgres's other flexible data types. If you just want the JSONB advice, skip ahead to the chapter called "The Vibecoder's JSONB Addiction." But if you are curious about what else Postgres has up its sleeve, this is worth the detour.

JSONB gets all the attention, but Postgres has been doing flexible data types for longer than most developers realize.

Arrays. Postgres has had native array support since version eight. You can declare a column as an array of any type, text, integer, even UUID. Arrays support GIN indexing, containment and overlap operators, and a rich set of functions for slicing, unnesting, and aggregating. For simple lists of values, like tags on a blog post or roles assigned to a user, an array column is simpler, smaller, and faster than JSONB. And the values are typed. An integer array enforces that every element is an integer. A JSONB array would let you accidentally store a string in a list of numbers without complaint.

Hstore. The grandparent of JSONB. A flat key-value store, text keys and text values, no nesting. It shipped in two thousand six, it is fast and compact, and it supports GIN and GiST indexing. In practice, most people use JSONB for everything hstore does. But hstore remains valid when you want the smallest possible storage footprint for flat metadata.

Composite types. Custom types with named fields, like a struct in C. Type-checked, decomposable in queries. Useful for fixed structures that repeat across tables, like an address with street, city, postal code, and country.

The decision tree is simpler than it looks. If your data is a list of same-typed values, use an array. If your data is a flat bag of string key-value pairs and storage size matters, consider hstore. If your data has a fixed known structure, use composite types or just regular columns. If your data has variable structure with nesting, JSONB. In practice, the choice is almost always between regular columns and JSONB, with arrays for the occasional list.

The Vibecoder's JSONB Addiction

Here is the pattern. You have seen it if you have used AI to generate database schemas. You might have done it yourself.

You describe your application to your AI assistant. "I need a users table with name, email, and some flexible metadata." The AI generates a schema. There is a name column, an email column, and a metadata column of type JSONB. Reasonable so far.

Then you say "I also need to store the user's preferences." Instead of suggesting a preferences table or specific columns, the AI says, put it in the metadata column. Still reasonable, maybe. Then you say "I need to track their subscription tier." In the metadata column. "Their last login IP?" Metadata. "Their billing address?" Metadata. "Their notification settings?" You can see where this is going.

I just kept putting things in the JSONB column. Every new feature, every new field. No migrations, no schema changes, just update the JSON. [excited] It was so easy. [worried] Then six months later I tried to run a query to find all users on the premium tier in California and the database took forty seconds to respond.

This is what the PostgreSQL community calls "the new EAV," a reference to the Entity-Attribute-Value pattern that plagued databases in the two thousands. EAV tables stored every attribute as a separate row with an entity ID, an attribute name, and a value. Querying them required horrifying self-joins. JSONB is cleaner syntax for the same antipattern. You have hidden your structure inside an opaque blob, and now the database cannot help you.

[serious] JSON is the new EAV. A great tool when you need it, but not something you should use as a first choice.

The vibecoder's version is especially insidious because the AI makes it frictionless. When you write schemas by hand, adding a JSONB column feels deliberate. When your AI assistant does it, it just appears. And it appears often, because JSONB solves the immediate problem. The AI does not think about query planning six months from now. It does not anticipate the report that filters by three fields buried in JSON. It solves what you asked for, not what you will need.

The rule of thumb. If a field appears in most rows, give it a column. If you query or filter by a field regularly, give it a column. If a field participates in a WHERE clause, a JOIN condition, or a GROUP BY, give it a column. Use JSONB for the genuinely variable remainder, the metadata that changes shape per record, the payload that comes from an external source, the settings that the user controls.

Dan Robinson at Heap suggested a more specific threshold. If a field is present in at least one out of every eighty rows, the storage overhead of repeating the key name in JSONB outweighs the overhead of a nullable column. At that point, extract it. Give it a proper column. Let Postgres do what it does best.

The Hybrid Pattern

The right answer, almost always, is both. Proper columns for the data you know about. JSONB for the data you do not.

Consider how the archive database works on the VPS. It stores documents from multiple sources, articles, conversations, references, each with different metadata. The core fields are columns: title, source, date, document type, content hash. These are the fields you search by, sort by, join on. They are indexed. They have statistics. The query planner understands them.

But each source has its own metadata that the others do not share. A newspaper article has a byline and a section. A conversation has a model name and a token count. A reference document has an author list and a publication venue. This variable metadata lives in a JSONB column. It is rarely queried across document types. When it is, the queries are loose, exploratory, not performance-critical.

The capture database follows the same pattern. Core data, the inbox item text, the creation timestamp, the completion status, those are proper columns. But the AI categorization results, the tags, the confidence scores, the suggested priority, those live in JSONB. The categorization schema evolves as the AI model improves. New fields appear, old ones get deprecated. JSONB handles this gracefully. No migrations needed for experimental categorization features.

The question is never "should I use JSONB?" The question is "which parts of my data are stable enough for columns and which parts genuinely need flexibility?" If you get that split right, you get the best of both worlds.

This is the pattern The Guardian followed. This is the pattern Heap eventually adopted, extracting forty-five frequently used fields from JSONB into proper columns and saving thirty percent on storage. Relational structure for the data you understand. JSONB for the data that has not settled down yet.

[slow] And here is the critical part for vibecoders. When your data does settle down, and it always does, go back and normalize. Extract the frequently queried keys into proper columns. Write the migration. It is not a sign of failure. It is a sign that your prototype worked and your product has matured enough to deserve a proper schema.

The Bridge

JSONB is real power. It lets Postgres handle semi-structured data that would otherwise require a separate database or an ugly pile of EAV tables. The researchers who spent a decade building the foundations knew what they were doing.

[serious] But power without understanding is how you end up with a five-million-row table full of JSON blobs and a query planner that is guessing. The temptation to skip schema design because JSONB makes everything flexible is real. Your AI assistant will encourage it. The short-term velocity feels amazing. And then the queries slow down, the storage balloons, and you realize you have been hiding from the relational model instead of using it.

Use JSONB for user preferences, event payloads, external API responses, and genuinely variable metadata. Use it during prototyping when you do not yet know the shape of your data. But when the shape becomes clear, and it always does, do the work. Extract the columns. Write the migrations. Let Postgres see your data.

You can store anything in a JSONB column. The question is whether Postgres can find it again quickly. And that, how to read the signs your database is sending you, is what we tackle next time.