This is episode nineteen of What Did I Just Install.
Every time you deploy code to a server, there is a quiet moment you probably do not notice. Your terminal shows a list of files scrolling past. Some are skipped. Some are transferred. The whole thing takes seconds, maybe a minute, and then your changes are live. You move on with your day.
Behind that quiet moment is an algorithm published in nineteen ninety-six by a PhD student in Canberra, Australia, who was supposed to be working on speech recognition. The algorithm was so clever that nearly thirty years later, nothing has replaced it. Not cloud services, not containers, not continuous deployment pipelines. When the files need to move, they still move the way Andrew Tridgell figured out they should.
In January twenty twenty-five, Google security researchers announced they had found six critical vulnerabilities in rsync, including one that allowed remote code execution on servers. Over six hundred and sixty thousand rsync servers were exposed worldwide. The tool was so embedded in the infrastructure of the internet that patching it was treated as an emergency across every major Linux distribution on the planet.
And the person who stepped in to release the security fix was Tridgell himself. After more than two decades of letting someone else maintain the project, the creator came back because the situation demanded it.
Before rsync, if you wanted to update files on a remote machine, you had two options. You could copy everything, which meant transferring gigabytes of data even if only a few bytes had changed. Or you could manually figure out which files were different and copy only those, which was tedious and error-prone.
Rsync solved both problems with a single insight. Instead of comparing whole files, it compares pieces of files. It splits the destination file into blocks, computes a checksum for each block, and sends those checksums to the source. The source then scans its version of the file, looking for matching blocks anywhere in the data, not just at the same positions. When it finds a match, it says "you already have this part." When it does not, it sends only the new bytes.
The magic is in the word "anywhere." If you insert a single paragraph at the beginning of a large document, a naive comparison would see the entire file as different, because every block has shifted. Rsync's rolling checksum slides across the data one byte at a time, finding the original blocks at their new positions. The result is that only the inserted paragraph gets transferred, plus a few bytes of bookkeeping.
This is what makes rsync feel like it understands your files. A hundred gigabyte database backup where three megabytes changed? Rsync sends three megabytes. A web server with ten thousand static files where you edited one CSS rule? Rsync sends a few hundred bytes. It is the difference between shipping an entire library across the country and mailing a single page with a correction.
Andrew Tridgell, known universally as Tridge, was born on the twenty-eighth of February nineteen sixty-seven. He earned a bachelor of science in applied mathematics and physics from the University of Sydney in nineteen eighty-eight, then moved to the Australian National University in Canberra to pursue a PhD in computer science.
His original doctoral research was in speech recognition. He never finished it. But the detour that killed his speech recognition thesis gave the world two of the most important pieces of open source software ever written, and accidentally set in motion the events that led to the creation of Git.
The first detour happened in December nineteen ninety-one. Tridgell had a practical problem that had nothing to do with his PhD. He was using a Sun workstation at the university, and he had a DOS PC at home. He wanted to share files between them. Specifically, he wanted to access his Unix home directory from the PC. He had been using PC-NFS for this, but his department also ran DEC Pathworks, a networking system from Digital Equipment Corporation. The problem was that you could not run both the Pathworks client and the PC-NFS client on the same DOS machine at the same time.
So Tridgell did what a procrastinating PhD student with a talent for systems programming would do. He wrote a packet sniffer. He pointed it at the network traffic between the Pathworks clients and servers. He watched the bytes go back and forth. And then he wrote a Unix program that spoke the same language, making his Unix machine appear to the DOS PCs as if it were a Pathworks file server.
He put it online in January nineteen ninety-two. The first three releases, versions zero point one, zero point five, and one point zero, all came out within a few weeks of each other. He called it "a Unix file server for Dos Pathworks." It did not have a proper name yet.
What Tridgell did not initially realize was that the Pathworks protocol was not proprietary to DEC at all. It was Microsoft's Server Message Block protocol, SMB, the networking protocol that every Windows computer used to share files and printers. By reverse-engineering what he thought was an obscure academic networking system, Tridgell had accidentally built the first open source implementation of Windows file sharing.
He needed a name. So he did what any Unix programmer would do. He ran a grep command through the system dictionary, looking for words that contained the letters S, M, and B in that order. The dictionary returned exactly four results. Salmonberry. Samba. Sawtimber. Scramble. He picked the one that sounded like a dance.
Samba became one of the most important pieces of infrastructure software in the world. It allowed every Linux and Unix server to participate in Windows networks, share printers, authenticate users, and eventually replace entire Windows domain controllers. Microsoft was not pleased. But the Samba team was careful. They did not call what they did reverse engineering. They called it protocol analysis. They were not breaking into anything. They were listening to a conversation happening on a public network and building software that spoke the same language.
The rsync story begins a few years later, around nineteen ninety-five, when Tridgell was still working on his PhD at ANU. His speech recognition research had led him into the problem of parallel sorting, which led him into the broader question of how to efficiently synchronize data between two machines. The practical problem was simple. You have a file on machine A and a slightly different version on machine B. How do you make them match while sending as little data as possible over the network?
Tridgell and his colleague Paul Mackerras, a fellow Australian computer scientist who would later become the maintainer of the PowerPC Linux kernel port and a senior technical leader at IBM, worked out a solution that was elegant enough to fill a PhD thesis.
The key innovation was the rolling checksum. Imagine you have a book, and you want to find out which pages have changed since the last edition. You could compare every page word by word, but that requires having both editions side by side. What if you only have one edition and a list of fingerprints for each page of the other? You could check each page against the list, but what if someone inserted a new paragraph on page three, pushing everything after it forward? Every page from three onward would have a different fingerprint, even if only a few words actually changed.
Tridgell's rolling checksum solved this by taking the fingerprint of a window that slides across the data one byte at a time. As the window moves, the checksum updates incrementally. You do not recompute it from scratch for each position. You subtract the byte that fell off the back and add the byte that appeared at the front. This makes it fast enough to check every possible alignment, not just the obvious ones. When the rolling checksum matches a known block, a stronger checksum confirms it is a real match, not a coincidence.
The algorithm was published as a technical report in June nineteen ninety-six. Rsync was announced on the comp.os.linux.announce newsgroup on June nineteenth of that year. And Tridgell finally submitted his PhD thesis, titled "Efficient Algorithms for Sorting and Synchronization," to ANU in nineteen ninety-nine. Three chapters of the thesis were about rsync.
In fifty years time I doubt anyone would have ever heard of Samba, but they will probably be using rsync in one way or another.
He said that in two thousand two. More than two decades later, it looks like he was right. The problems SMB solved are fading as the world moves past Windows-centric networking. But the problem rsync solved, efficiently moving changed data between two places, is as fundamental as it was in nineteen ninety-six.
The name rsync is one of the rare honest ones in software. It stands for remote sync. No acronym games, no obscure cultural reference, no joke that only makes sense at two in the morning. It does what it says. It synchronizes files remotely.
This is notable precisely because the same person who named rsync also named Samba by running grep through a dictionary. Tridgell apparently used up his naming creativity on Samba and decided to be straightforward the second time around.
If you have listened to the Git Good series, you already know this story from the other side. But it is worth telling briefly from Tridgell's perspective, because the rsync creator's habit of analyzing protocols did not stop with SMB.
In January two thousand five, Tridgell joined the Open Source Development Labs on a one-year fellowship. OSDL was a nonprofit funded by IBM, Hewlett-Packard, and Intel to promote Linux in the enterprise. Linus Torvalds also worked there. The Linux kernel at the time used BitKeeper, a proprietary version control system whose creator, Larry McVoy, had given free licenses to open source projects.
Tridgell, following the same instinct that had produced Samba, looked at the BitKeeper network protocol. He connected to a BitKeeper server via telnet, typed help, and the server told him everything it could do. He used what he learned to begin building SourcePuller, a free tool that could extract data from BitKeeper repositories. McVoy was furious. He demanded that OSDL fire Tridgell. OSDL refused. McVoy revoked the free BitKeeper licenses for all Linux kernel developers.
The irony was not lost on anyone. When Tridgell analyzed Microsoft's SMB protocol to build Samba, the open source world called him a hero. When he analyzed BitMover's BitKeeper protocol to build SourcePuller, Linus Torvalds called him a saboteur. The technique was identical. Only the target had changed.
Torvalds, left without a version control system, sat down on a Sunday in April two thousand five and started writing Git. The rest is, quite literally, another series. But it is worth noting that the same person's pattern of behavior, look at the protocol and build an open implementation, created both Samba and the crisis that produced Git. Tridgell did not build Git. But without his instinct for protocol analysis, the pressure that forced Git into existence would never have materialized.
After rsync was released and established, Tridgell moved on. He had Samba to maintain, a PhD to finish, and eventually, drones to fly. In two thousand two, maintenance of rsync passed to Wayne Davison, an American developer who had been contributing significant patches and features. Davison would maintain rsync for over twenty years, a quiet and largely thankless job that kept the tool reliable while the world around it transformed.
Davison was not only a rsync maintainer. He also worked on the Z-Shell and had previously maintained GNU screen and trn, the threaded newsreader. The kind of developer whose contributions are measured in decades of stability rather than in launch-day headlines.
Meanwhile, Tridgell found a new obsession. In two thousand ten, he walked into a meeting of MakeHackVoid, a hackerspace in Canberra, where a group of people were discussing entering the Outback Challenge, an Australian competition for autonomous search-and-rescue drones. Tridgell joined as a software engineer. Within a few years, he had become the lead developer of ArduPilot, the open source autopilot software that now runs on over a million vehicles worldwide. His team, CanberraUAV, won the Outback Challenge's fifty-thousand-dollar grand prize.
From reverse-engineering Windows networking to inventing the most efficient file transfer algorithm to flying autonomous drones through the Australian outback. The thread connecting all of it is protocol analysis and efficient data movement. Moving bytes between machines. Moving telemetry between a ground station and an aircraft. Moving changed blocks across a slow nineteen-nineties internet connection. The problem keeps changing its shape. Tridgell keeps solving it.
Rsync's influence extends far beyond the command you type in a terminal. The rolling checksum algorithm became a template for an entire category of software.
Librsync is an independent library implementing Tridgell's algorithm. Dropbox uses it, splitting your files into four-megabyte blocks and using checksums to determine which blocks have changed before syncing. Rdiff-backup uses it for incremental backups. Duplicity uses it for encrypted remote backups. Zsync uses a variant optimized for distributing large files to many recipients, like Linux distribution images.
Every time your cloud storage service updates a large file without re-uploading the entire thing, there is a good chance the math underneath descends from a PhD thesis written in Canberra in the nineteen nineties.
Rsync is the rare piece of software that won by being correct. Not by being trendy, not by having a venture-capital-funded marketing team, not by being part of an ecosystem play. It solved a problem so well that the problem never needed to be solved again. In thirty years, no competitor has displaced it. Newer tools exist, rclone for cloud storage, specialized CI and CD pipelines, container-based deployment, but when the actual moment of moving files arrives, rsync is still what runs underneath.
This is the opposite of the stories this series usually tells. There is no drama. No burnout. No licensing war. No corporate acquisition. No maintainer meltdown. Tridgell built it, published it, handed it to Davison, and moved on to other problems. Davison maintained it quietly for two decades. The tool just worked.
I program because I enjoy it. Whether I get paid is secondary.
The Australian government recognized Tridgell with a Medal of the Order of Australia in January twenty twenty for service to information technology. The Australian National University gave him an honorary Doctor of Science in twenty eighteen for authoring Samba and co-inventing rsync. These honors came more than two decades after the work they recognized. That is how infrastructure works. The people who matter most are visible last.
In January twenty twenty-five, rsync version three point four was released as a critical security update. Six vulnerabilities discovered by Google Cloud researchers, including a heap buffer overflow with a severity score of nine point eight out of ten. Over six hundred and sixty thousand rsync servers were directly exposed to the internet, and the most critical vulnerability allowed an attacker to execute arbitrary code on a server with nothing more than anonymous read access.
What made the release notable was not just the severity of the bugs. It was who signed the announcement. Andrew Tridgell. After letting Wayne Davison handle releases since two thousand two, the original creator came back to shepherd the security fix through. Twenty-three years after stepping away, Tridgell's name was on the release notes again.
Rsync ships pre-installed on every major Linux distribution. It has shipped that way since the late nineteen nineties. It is the default file synchronization tool on macOS. It runs on FreeBSD, OpenBSD, Solaris, and AIX. It is the backbone of backup systems, deployment scripts, mirror networks, and continuous delivery pipelines across millions of servers. The biggest installed base of any open-source backup utility, according to Enterprise Strategy Group.
Think about every deploy command, every backup script, every mirror you have ever used. There is a very good chance rsync was the tool that actually moved the files. If you are listening to this podcast, the episode file was synced to the server using rsync. The deploy alias that pushes new episodes is a thin wrapper around the same algorithm a PhD student in Canberra published in nineteen ninety-six.
The person who wrote it also accidentally created the conditions for Git to exist. He also built the software that lets Linux talk to Windows. He also flies open source drones through the Australian outback to find lost hikers. And when the security crisis came in twenty twenty-five, he came back and signed the fix himself.
That was episode nineteen. Next time you type rsync and watch the file list scroll past, remember that every line represents a checksum sliding one byte at a time across your data, finding the parts that already exist on the other side, sending only what is new. It is not glamorous. It is not exciting. It is just correct. And it has been correct for thirty years.
Open a terminal. Type rsync dash dash version. That is the
build running every deploy, every backup, every mirror on your
machine. Now try rsync dash a dash v dash dash dry dash run
a folder to another location. The dry run flag shows you what
rsync would transfer without actually doing it. Watch the file
list. Every file it skips is a file it already knows has not
changed. That is Tridgell's rolling checksum, quietly saving
you bandwidth since nineteen ninety-six.