TensorFlow: The Graph That Broke the Researchers

The Error Message

This is episode eighteen of What Did I Just Install.

It is sometime in two thousand seventeen. You are a graduate student, three months into your PhD, sitting in a university lab at two in the morning. You have been staring at the same Python script for four hours. You are trying to build a neural network. Not a complicated one. A simple image classifier, the kind of thing that should take maybe fifty lines of code in any reasonable framework. Your script is a hundred and forty lines and counting.

You have a placeholder for your input images. You have a placeholder for your labels. You have defined your layers, your weights, your biases, your activation functions. You have constructed a loss function and an optimizer. None of this has actually done anything yet. You have merely described what you would like the computer to do, eventually, when it gets around to it. Now you need to create a session, initialize all your variables inside that session, feed your data through a dictionary that maps your placeholders to actual numbers, and run the session to get a result.

You run the script. You get an error. It says something about a shape mismatch between two tensors, referencing nodes called "dense slash kernel colon zero" and "placeholder colon zero." You have never named anything "dense slash kernel colon zero." TensorFlow named it for you, helpfully, when it was constructing the computation graph behind your back. You cannot put a breakpoint inside the graph. You cannot print the value of a variable while the graph is being built because the variable does not have a value yet. It will only have a value when the session runs. Which it will not do because you have a shape mismatch.

Your labmate walks over, looks at your screen, and says the words that tens of thousands of machine learning researchers said to each other during this era. "Have you tried PyTorch?" You have not tried PyTorch, because your advisor uses TensorFlow, and your advisor's lab has three years of TensorFlow code that everyone is expected to build upon. You are locked in. And the framework that has locked you in is TensorFlow, the most important machine learning library ever written, and also one of the most frustrating pieces of software that a well-funded corporation has ever released into the world.

This is the story of how TensorFlow came to dominate machine learning, how its own design philosophy drove researchers to the edge, and how a French developer named Francois Chollet built the layer of humanity that kept the whole thing from collapsing.

What a Tensor Flows Through

Before we get to the people and the politics, let us talk about what TensorFlow actually does, because the name is one of the few cases in software where the name tells you exactly what the thing is.

A tensor is a multi-dimensional array. A single number is a zero-dimensional tensor. A list of numbers is a one-dimensional tensor. A spreadsheet of numbers, rows and columns, is a two-dimensional tensor. Stack a bunch of spreadsheets on top of each other and you get a three-dimensional tensor. Keep going and you get four, five, however many dimensions you need. When you feed an image into a neural network, that image is a tensor. When the network produces a prediction, that prediction is a tensor. Every step in between, the weights, the activations, the gradients, all tensors.

The "flow" part describes what happens to those tensors. In TensorFlow, you define a computation graph, a kind of blueprint that describes how data should move through a series of mathematical operations. Data enters the graph at one end, flows through multiplication and addition and nonlinear transformation nodes, and exits the other end as a prediction or a classification or a generated image. The graph itself is a map. The data that moves through it is the territory.

This matters because training a neural network is, at its core, an exercise in calculus on tensors. You feed data through the graph, compare the output to what you wanted, calculate how wrong you were, and then flow backward through the graph adjusting every weight by a tiny amount. Forward pass, backward pass, repeat a few million times. The framework's job is to make this process fast, correct, and scalable across multiple processors, multiple machines, and eventually multiple continents of data centers. TensorFlow was built to do this at Google scale. And it was built because its predecessor could not keep up.

The Montreal Lab

But before there was TensorFlow, before there was even Google Brain, there was a lab in Montreal where a group of academics was quietly building the foundation that every deep learning framework would eventually stand on.

The year was two thousand seven. At the Universite de Montreal, a professor named Yoshua Bengio ran a research lab called LISA, which would later become MILA, the Montreal Institute for Learning Algorithms. Bengio was one of a handful of researchers who had refused to give up on neural networks during the long winter of the late nineteen nineties and early two thousands, when the rest of the field had moved on to support vector machines and random forests and other techniques that actually seemed to work. Neural networks were considered a dead end by most of the machine learning establishment. Bengio, along with Geoffrey Hinton in Toronto and Yann LeCun in New York, kept the faith.

We were a small community. Perhaps a few hundred people worldwide who still believed that deep neural networks could work if we could find the right training techniques. Most of the field thought we were wasting our time. The funding agencies were skeptical. The conferences were skeptical. We published our papers and kept working.

Out of this lab came a software library called Theano, named after an ancient Greek female mathematician associated with Pythagoras. The idea was ambitious for its time. You would define mathematical expressions symbolically in Python, and Theano would compile them into highly optimized code that could run on either a CPU or a GPU. The killer feature was automatic differentiation. If you could express your neural network as a mathematical graph, Theano could compute the gradients for you, automatically, without you having to derive them by hand.

This does not sound revolutionary now, but in two thousand seven it was a genuine breakthrough for research productivity. Before automatic differentiation tools, implementing the backward pass of a neural network meant manually deriving and coding the gradient of every operation in your model. A single mistake in one gradient computation would produce silent numerical errors that could take weeks to track down. Theano made this problem disappear. You described the forward computation. Theano handled the rest.

The lab in Montreal became a magnet for ambitious young researchers. One of them was a PhD student named Ian Goodfellow, who arrived to study under Bengio and would, in two thousand fourteen, invent generative adversarial networks, one of the most important ideas in modern machine learning. He trained his original GAN implementation on Theano. The framework that ran on a dozen GPUs in a Canadian university lab was producing ideas that would reshape the entire field.

But Theano had limits. It was an academic project with academic resources, maintained by graduate students who would eventually graduate and leave. It was powerful for research but difficult to deploy in production. It could train models but it could not serve them at scale. And as neural networks got larger and more complex, Theano's compilation times grew longer and its error messages grew more cryptic. It had proved that a deep learning framework was both possible and necessary. It could not become the framework that the entire industry would use. That would take a different kind of organization entirely.

The Engineer Who Built Google's Foundation

Jeffrey Adgate Dean was born on July twenty-third, nineteen sixty-eight. He grew up with parents who worked for the World Health Organization, which meant a childhood spent moving between countries. He earned a bachelor's degree, summa cum laude, in computer science and economics from the University of Minnesota in nineteen ninety, and a PhD in computer science from the University of Washington in nineteen ninety-six, studying compilers under Craig Chambers.

In nineteen ninety-nine, Jeff Dean joined Google as roughly the twentieth employee. He was among the first thirty people through the door of a company that would eventually employ over a hundred and eighty thousand. What Dean did at Google in the following decade is difficult to overstate. He co-designed MapReduce, the programming model that let Google process the entire web's worth of data across thousands of ordinary computers. He co-designed BigTable, the storage system that handled petabytes of structured data for products used by billions of people. He co-designed LevelDB, an open-source key-value store. Every one of these systems became a foundational paper in distributed computing. Every one of them was built with Sanjay Ghemawat, Dean's long-time programming partner, and the two of them are the only people in Google's history to hold the title of Senior Fellow, the highest technical rank in the company.

Dean had become such a legendary figure inside Google that engineers started writing "Jeff Dean Facts," a series of jokes modeled on the old Chuck Norris jokes but for systems programming. "Jeff Dean compiles and runs his code before submitting, but only to check for compiler and CPU bugs." "Jeff Dean's code does not follow the specs. The specs follow his code." They were jokes, but they reflected something real. Dean was the person you pointed to when you needed to explain what a truly exceptional engineer looked like.

In two thousand eleven, Dean turned his attention to machine learning. Together with Andrew Ng, a Stanford professor on sabbatical at Google, and Greg Corrado, a neuroscientist, he co-founded a project called Google Brain. The mission was to explore whether deep neural networks could be useful inside Google's products. This was not obvious at the time. Deep learning was still considered experimental by most of the industry. The team built a system called DistBelief as their internal framework, and in two thousand twelve they demonstrated something that captured the world's imagination.

We took sixteen thousand CPU cores and connected them into one massive neural network with a billion connections. We fed it ten million random frames from YouTube videos. No labels. No instructions. We just said, learn what patterns you can find. And when we looked at what the network had learned, one of the neurons had independently developed a concept of a cat.

The "Google Cat" experiment, published in June of two thousand twelve, became one of the most famous demonstrations in the history of artificial intelligence. Not because recognizing cats is inherently important, but because the network learned the concept on its own, from unlabeled data, at a scale that nobody had achieved before. It was a proof of concept that deep learning could work on massive datasets if you had enough compute. And Google had more compute than anyone.

DistBelief powered this experiment and went on to be used by over fifty teams within Google and its parent company Alphabet. It improved speech recognition in the Google app by twenty-five percent. It powered image search in Google Photos. It won the ImageNet Large Scale Visual Recognition Challenge in two thousand fourteen. But DistBelief had been built quickly, for internal use, with the assumption that only Google engineers would ever need to touch it. The codebase required users to write C plus plus to define new neural network layers. It was inflexible. It was hard to configure. And as the field of deep learning accelerated, DistBelief could not keep up with the pace of research.

We knew we needed something better. DistBelief had served us well, but the field was moving so fast that we needed a system that was more flexible, more portable, and that could run on different kinds of hardware. We started building TensorFlow in two thousand fourteen as the second-generation system.

The Name That Explained Itself

Rajat Monga, the engineering director who would lead much of TensorFlow's development, worked alongside Dean and a large team of Google Brain researchers and engineers. The first author on the eventual whitepaper was Martin Abadi, a senior researcher at Google with deep expertise in programming languages and security, the kind of person who could bridge the gap between theoretical computer science and practical engineering.

The name they chose was TensorFlow. And unlike most software names, which are either acronyms nobody remembers or random words someone thought sounded cool, this one is a precise technical description. Tensors are the data. Flow is what happens to them. Multi-dimensional arrays flowing through a computation graph. The name tells you exactly what the software does, every time you say it.

On November ninth, two thousand fifteen, Google announced that TensorFlow was being released as open-source software under the Apache two point zero license. The announcement came with a whitepaper listing over forty authors from across Google, including Dean, Monga, Abadi, Ghemawat, and, notably, Ian Goodfellow, who had by then left Montreal to join Google Brain. The student who had trained the first GAN on Theano had joined Google Brain and co-authored the framework that would replace it. The academic lab's protege had helped build the industrial successor.

TensorFlow is Google's second-generation machine learning system. We have open-sourced TensorFlow in the hope that it can be used by the machine learning community for research and production alike. We believe that machine learning is key to the future of technology, and we want the best tools to be available to everyone.

The reaction was enormous. Within weeks, TensorFlow became the most starred machine learning project on GitHub. Within months, it was being taught in university courses. Within a year, it had become the default framework for deep learning, the thing that everyone used unless they had a specific reason not to. Google had done something that corporate open-source projects rarely achieve. They had released an internal tool that was genuinely better than the alternatives, and the community adopted it not because of Google's brand but because the software actually worked.

But here is the thing about TensorFlow that made it both powerful and deeply, profoundly frustrating. The same design decision that made it fast, scalable, and deployable at Google's scale also made it one of the most hostile developer experiences in the history of programming frameworks. And that decision was the static computation graph.

The Session Will See You Now

TensorFlow's core idea, inherited from Theano and taken to its logical extreme, was that you should separate the definition of a computation from its execution. You build a graph first. Every operation, every variable, every mathematical step is a node in this graph. The graph is a complete description of everything the computer will eventually do. Then, once the graph is fully defined, you create a session and execute it.

This is called "define and run." You define the blueprint. You run the blueprint. Two separate steps.

The reason for this design was performance. When the framework can see the entire computation graph before executing any of it, it can optimize aggressively. It can fuse operations together. It can figure out which parts can run in parallel. It can distribute the graph across multiple GPUs or even multiple machines. For Google, which needed to train models on hundreds of processors simultaneously, this was not optional. It was a requirement.

But for a researcher sitting in a lab trying to test an idea, it was a nightmare. Let me walk you through what even a simple TensorFlow one point x program looked like.

First, you defined your placeholders. These were empty shells, promises that data would arrive later. Your input images were not images yet. They were placeholders shaped like images. Your labels were not labels. They were placeholders shaped like labels.

I remember the first time I tried to print the value of a variable in TensorFlow. I wrote print, open parenthesis, my variable, close parenthesis. And it printed "Tensor, shape equals parenthesis none comma seven hundred eighty-four, close parenthesis, dtype equals float thirty-two." Not the numbers. Not the actual values. Just a description of the shape. Because the variable did not have values yet. It was just a node in a graph that had not been executed.

Then you defined your operations. Matrix multiplications, additions, activation functions. Each one created a new node in the graph. None of them actually computed anything. You were assembling a blueprint, not running code. Then you defined a loss function and an optimizer. Still no computation. Then you created a tf dot Session. This was the portal between the world of graph definitions and the world of actual numbers. Inside the session, you called session dot run, passing a feed dictionary that mapped your placeholders to actual data. Only now, finally, did TensorFlow execute anything.

If something went wrong, and something always went wrong, the error message referred to nodes in the graph by their auto-generated names. If you had three dense layers, they were called "dense," "dense underscore one," and "dense underscore two." The connection between your Python code and these names was indirect at best. Debugging meant mentally reconstructing the graph, figuring out which line of your code created which node, and then reasoning about what shape mismatch or type error could have occurred at that point in the graph.

There was a running joke in our lab. Someone would ask how to debug a TensorFlow model, and the answer was always "add tf dot Print to your graph and hope for the best." The real answer was "rewrite it in PyTorch." But we could not say that out loud because our advisor had tenure and TensorFlow code.

For researchers who needed to iterate quickly, to test an idea in an afternoon, to modify a model architecture on the fly and see what happened, this was agony. The static graph demanded that you know your entire computation before you started. But research is precisely the process of not knowing your computation until you try it. The mismatch between TensorFlow's design philosophy and the actual workflow of machine learning research created a frustration that simmered across every lab in every university that used it.

The Woman Behind the Gate of Horn

Eight months before TensorFlow was even released, a twenty-five-year-old French engineer named Francois Chollet was sitting in front of a problem that would change the trajectory of deep learning tooling forever.

Chollet was born on October twentieth, nineteen eighty-nine, in France. He graduated in two thousand twelve from ENSTA Paris, part of the Polytechnic Institute of Paris, with a master's degree in engineering. By early two thousand fifteen, he was at Google, doing natural language processing research in his spare time, and he could not find a tool that did what he needed.

I started Keras for my own use, pretty much. I was doing natural language processing research in my free time, looking for a good tool to work with recurrent neural networks. The support for recurrent networks in the existing tool ecosystem was near-inexistent. So I decided to make my own Python library on top of Theano, borrowing some ideas from the Scikit-Learn API and the Torch API.

Chollet released Keras in March of two thousand fifteen. The name came from the ancient Greek word keras, meaning horn, and it was a reference to Homer's Odyssey. In the epic poem, there are two gates through which dreams pass. Dreams that come through the gate of ivory are deceptive, illusions that mislead. Dreams that come through the gate of horn are true, prophetic visions of reality. Chollet chose the name as an aspiration. Keras would be a tool for seeing clearly, for turning ideas into working models without the fog of unnecessary complexity.

The library was, from the very first release, different from everything else that existed. It was built around a single principle that Chollet would repeat in every interview, every talk, every page of documentation he wrote.

Being able to go from idea to result with the least possible delay is key to doing good research.

This was not just a nice sentiment. It was a design manifesto. Keras had a Sequential model where you stacked layers like building blocks. You created a model in five lines, compiled it with one line, trained it with one line. Seven lines total for a neural network that would have taken a hundred and forty lines in raw TensorFlow. The API was so clean, so intuitive, that you could show it to someone who had never written machine learning code before and they could read it and understand what was happening.

Keras was the first deep learning library for Python that supported both recurrent neural networks and convolutional networks in the same framework. It had the first reusable open-source implementation of a long short-term memory network, one of the most important building blocks in modern natural language processing. And it did all of this while running on top of Theano, using Theano's computation engine underneath but hiding its complexity behind a human-readable interface.

Chollet did not expect it to become what it became. He thought he was building a tool for the small community of people who were doing deep learning research in early two thousand fifteen, a community of perhaps a few thousand people worldwide. Keras started getting users from the very first day. Then it kept getting more. And more.

The reason was simple. Deep learning was exploding, universities were adding it to their curricula, companies were hiring machine learning engineers, and everyone who tried to learn TensorFlow hit the same wall of sessions and placeholders and cryptic error messages. Then someone would point them to Keras, and suddenly the thing that had taken a hundred and forty lines took seven, and it worked, and they could actually see what their model was doing. Keras was not technically a competitor to TensorFlow. It was a layer on top of it. But it was the layer that made TensorFlow usable for human beings.

The Adoption

Google noticed. When a significant fraction of your users interact with your framework exclusively through a third-party wrapper built by one of your own employees, that tells you something about your framework's usability. In two thousand seventeen, TensorFlow officially adopted Keras as its recommended high-level API. The tf dot keras module was born.

We are excited to announce that Keras will be the central high-level API for TensorFlow. This integration provides a consistent, user-friendly interface for building and training models, while still allowing advanced users to access TensorFlow's full flexibility when needed.

For Chollet, this was both validation and a kind of corporate absorption. Keras had been framework-agnostic, originally running on Theano, with the ability to swap in TensorFlow or CNTK as backend engines. The integration into TensorFlow meant that for most users, Keras and TensorFlow became the same thing. You imported tensorflow dot keras and you got the clean API backed by Google's computation engine. The Theano backend faded. The CNTK backend faded. Keras, in practice, became a TensorFlow product.

This was a pattern we have seen before in this series, the gravitational pull of a well-resourced corporate project absorbing the innovation of an independent creator. But unlike the stories we covered in the requests episode or the Express episode, this was not hostile. Chollet remained at Google, maintained Keras, and was given resources to develop it further. He wrote a book, "Deep Learning with Python," published by Manning, that became the standard introduction to the field. If TensorFlow was the engine, Keras was the steering wheel, and Chollet designed the steering wheel so well that people forgot they were driving a TensorFlow car.

Meanwhile, in Montreal, the writing was on the wall. On September twenty-eighth, two thousand seventeen, Yoshua Bengio sent a message through his colleague Pascal Lamblin announcing that Theano would cease major development after its one point zero release.

The decision is difficult, but it is the right one. The field has evolved beyond what an academic lab can sustain as a competitive software project. TensorFlow and other frameworks backed by industrial resources can provide the tools that researchers need. Our contribution was to prove the concept. That proof has been accepted.

Theano one point zero was released on November fifteenth, two thousand seventeen, and active development slowed to a halt. Maintenance continued into two thousand eighteen, but the writing was on the wall. The framework that had made deep learning frameworks possible, that had trained the original generative adversarial network, that had proved you could define neural networks as symbolic graphs and let the computer handle the calculus, was retired. Its ideas lived on in every framework that followed. Its code was eventually forked by the PyMC team under the name Aesara and continued in a narrower capacity. But the Theano era was over.

And Ian Goodfellow, who had trained the world's first GAN on Theano in Bengio's lab in two thousand fourteen, was now listed as a co-author on the TensorFlow whitepaper, working at Google Brain. The student had not just left the lab. The student's new employer had made the lab's software obsolete.

The Graph Problem

By two thousand seventeen, TensorFlow dominated machine learning. It was the most used framework in industry. It was the default in many university courses. Google was investing massive resources into it. But a tension was building that no amount of corporate support could resolve, because it was not a bug in the software. It was a philosophy baked into the foundations.

The static computation graph, the feature that made TensorFlow powerful for production deployment, was the same feature that made it hostile for research. And research is where the ideas come from. If researchers abandon your framework, the ideas start getting published in another framework's code, and the students learn that other framework, and the industry follows the students. The pipeline of innovation depends on the research community, and the research community was increasingly miserable.

The frustration went beyond individual complaints. It was structural. In PyTorch, which Facebook had released in January of two thousand seventeen, you could write a neural network that looked like ordinary Python. You could use for loops and if statements inside your model. You could print values. You could set breakpoints. You could debug with the same tools you used for any other Python program. PyTorch used a dynamic computation graph, built on the fly as the code executed. Every run could be different. Every experiment could reshape the model architecture.

The moment I switched to PyTorch, I felt like I had been holding my breath for two years and finally exhaled. I could print a tensor and see numbers. Actual numbers. Not a description of a graph node. I could step through my model with a debugger. I could change the architecture in the middle of training and see what happened. It felt like going from writing assembly language to writing Python.

TensorFlow's team eventually acknowledged the problem. In two thousand nineteen, four years after the original release, TensorFlow two point zero made eager execution the default mode. Eager execution meant that operations ran immediately when called, just like in PyTorch. You could print values. You could use Python control flow. The sessions were gone. The placeholders were gone. The feed dictionaries were gone. A decorator called tf dot function let you opt back into graph compilation for performance-critical code, but the default experience was now dynamic and interactive.

It was, in a sense, an admission that the original design had been wrong for a significant portion of TensorFlow's users. Not wrong in an absolute sense. The static graph was the right choice for production deployment at Google's scale. But it was the wrong choice for the people who needed to experiment, to iterate, to fail quickly and try again. And those people, the researchers, were the ones who decided what the next generation of machine learning engineers would learn in graduate school.

The Power and the Prison

So what does the TensorFlow story tell us, beyond the technical details of computation graphs and eager execution?

It tells us something about the tension between power and usability that runs through the entire history of developer tools. Google's engineering culture is legendary for a reason. They build systems that handle billions of requests per day, that run across data centers spanning the planet, that process more data in an hour than most companies see in a year. When Google's engineers built TensorFlow, they built it the way they build everything. Maximum power. Maximum scalability. Maximum correctness. The assumption was that developers would accept the complexity because the capability was worth it.

And in production, at Google scale, that assumption was correct. TensorFlow powered Google Translate, Google Photos, Google Search ranking, Google's speech recognition, and dozens of other products that billions of people used every day. The static graph, the distributed execution, the hardware-specific optimizations, all of it mattered when you were training a model on a thousand GPUs and serving it to a billion users.

But the research community did not need a thousand GPUs. They needed to try an idea and see if it worked. They needed a tool that got out of their way, not a tool that demanded they specify the entire computation upfront. The disconnect was between Google's internal needs and the external community's needs. Google released TensorFlow as open source, which was generous and strategic, but they designed it for Google's problems, not for the problems of a graduate student with one GPU and a deadline.

User experience should be central in API design. A well-designed API makes complicated tasks feel easy. The goal is to reduce the cognitive load on the developer, to let them focus on the research question rather than the framework mechanics. If the framework is getting in the way of the research, the framework has failed, no matter how powerful it is underneath.

Chollet understood this instinctively. His philosophy, that the speed of going from idea to result is the most important property of a research tool, was the exact inverse of Google's engineering philosophy, which prioritized the speed of going from trained model to production deployment. Both philosophies were correct. They were just correct for different people.

This is the deeper lesson. Corporate open source is a gift, but it is a gift that comes shaped by the corporation's own needs. Google released TensorFlow because they wanted the machine learning ecosystem to standardize on their framework. They wanted academic papers to include TensorFlow code. They wanted graduates arriving at Google already knowing TensorFlow. This is not cynical. It is rational. But it means the open-source community is, in a sense, a secondary audience. The primary audience is always internal.

The Keras integration was Google's attempt to bridge this gap, and it largely worked. Keras gave the research community the interface they wanted while TensorFlow provided the engine they needed. But the fact that the bridge had to be built at all, that the framework's own API was insufficient for its own users, is the most revealing detail in the story.

The Seeds of Something New

By late two thousand seventeen, the landscape was shifting in ways that TensorFlow's dominance could not prevent. PyTorch, released by Facebook's AI Research lab in January of that year, was gaining adoption among researchers at a pace that startled everyone, including Facebook.

The numbers told the story. Academic papers using PyTorch were increasing every quarter. Conference submissions at NeurIPS and ICML were increasingly accompanied by PyTorch code rather than TensorFlow code. The researchers who had suffered through sessions and placeholders and cryptic error messages were switching, one lab at a time, to a framework that felt like Python instead of like a configuration language wearing Python's syntax as a disguise.

And here is the connection to your own machine. If you have ever installed the transformers library from Hugging Face, the library that powers most of the modern natural language processing pipelines, you have TensorFlow somewhere in your dependency tree. The library supports both TensorFlow and PyTorch as backends, because the ecosystem grew up in the era when both frameworks mattered. Even projects that primarily use PyTorch today carry TensorFlow's fingerprints in their abstractions, their tensor operations, their computation patterns. The conceptual DNA of TensorFlow, which was the conceptual DNA of Theano before it, which was the conceptual DNA of Yoshua Bengio's lab in Montreal, runs through every neural network that has ever been trained on this machine.

TensorFlow did not fail. It remains one of the most deployed machine learning frameworks in the world. It powers production systems at a scale that most other frameworks cannot match. The TensorFlow Lite runtime runs on mobile phones. TensorFlow dot js runs in web browsers. Google's Tensor Processing Units, custom chips designed specifically for TensorFlow workloads, represent a level of hardware-software integration that no other framework has achieved. If you measure TensorFlow by its original design goals, deploying machine learning at Google's scale, it succeeded beyond anyone's expectations.

But in the research labs, in the graduate programs, in the fast-moving world of people who needed to try something new every afternoon, TensorFlow was losing ground. And the framework that was taking that ground, the one built on dynamic graphs and Pythonic design and the radical idea that a machine learning tool should feel like writing Python, was already being built in a research lab at Facebook.

Francois Chollet would stay at Google for seven more years after the Keras integration, continuing to develop the library and writing his influential book. In November of two thousand twenty-four, he left Google to start a new company, still promising to stay involved with Keras from the outside. Yoshua Bengio won the Turing Award in two thousand eighteen, alongside Geoffrey Hinton and Yann LeCun, the three of them recognized as the godfathers of deep learning. Jeff Dean became Google's chief scientist in two thousand twenty-three. Theano's code lived on in fragments, forked and renamed, its ideas woven into every framework that followed.

But in two thousand seventeen, the revolution had already begun. In a research lab at Facebook, a team led by a developer named Soumith Chintala was building something that would challenge everything TensorFlow stood for. A framework where the graph was built as you ran it, where debugging felt like debugging any other Python program, where the entire philosophy was researcher-first rather than production-first. The framework that would change everything was already spreading, quietly, person to person, lab to lab. By the time Google noticed, it was already too late.

That story begins in a very different kind of lab, with a very different kind of philosophy, and it deserves its own episode.

TensorFlow is heavy, but you can see the framework war in miniature on your own machine. Open a terminal and type pip install torch, then pip install tensorflow. In a Python shell, try both. Import torch, then torch dot tensor open parenthesis bracket one comma two comma three bracket close parenthesis. Then import tensorflow as tf, then tf dot constant open parenthesis bracket one comma two comma three bracket close parenthesis. Same result. Different philosophies underneath. One was built for researchers who think in Python. The other was built for production systems that think in graphs. The next episode tells you which one won and why.

That was episode eighteen.