In the roughly three decades since the Zip file format debuted, compressed files have been subject to widespread use—and occasional misuse.
Certainly, what programmer and engineer David Fifield recently did with the Zip file format fits in the latter category. Effectively, Fifield invented a new type of “Zip bomb” or “compression bomb”—a malware variant in which a Zip file as small as a few kilobytes is so compressed it would completely fill up your hard drive when decompressed. And then some.
Videos by VICE
Files like these aren’t unheard-of, but Fifield’s techniques, revealed in an article on his website, are particularly novel because they aren’t “recursive,” or built like an inverted Russian nesting doll where the files get bigger as you decompress multiple layers of Zip files. Instead, Fifield figured out how to “overlap” files inside of a Zip archive, allowing for compression rates far beyond those of a traditional archive—in his most impressive case, he managed to get a 46-megabyte file to compress 4.5 petabytes of data.
It’s just the latest twist in a decades-old variant of malware that isn’t a virus but could still do a number on your laptop.
How We Got Zip Bombs
So what makes Zip bombs possible in the first place? First, a little history and some explanation. The story of the Zip format dates to 1988, when Phil Katz, a programmer and BBS user, responded to a lawsuit over his creation of compression software compatible with the ARC format by creating a new compression algorithm that was incompatible with (and slightly better than) the existing format, but was openly distributed. This led the format to become a de facto standard.
(Katz, while seeing much success and notoriety from his creation, had significant issues in his personal life and died of alcoholism-related disorders in 2000 at the age of 37. His story is fascinating and tragic, if you want to dig in.)
The Zip format, and variants like it, worked well on bulletin boards and FTP servers, and became ubiquitous thanks to shareware and the format’s eventual addition to Windows and Macintosh operating systems. However, their compression approach made them exploitable in unique ways. Part of the reason Zip bombs are possible has much to do with the way that compression algorithms work. At a high level, compression systems take repeated information in the compressed archive and break down the repetition in the most minimal way possible. It’s why an MP3 file, which is already compressed, doesn’t shrink down as much as a standard text file might.
The standard allows the idea to be taken to a logical extreme. If a file is full of a trillion zeroes, it means that the compressed file can be extremely tiny even if the decompressed file is absolutely massive.
The result is that the file itself is basically useless except as a way to defeat some antivirus programs (which often scan compressed files to see if they can uncover viruses) and as something of an engineering exercise.
According to a presentation given at the USENIX Security Symposium in 2015, the first example of a “Zip bomb” in the wild dates to 1996, when a file was uploaded to the bulletin board network Fidonet with the specific goal of getting an admin to open it. (A text file regarding the incident suggests, to put it lightly, that it didn’t go over well.) By 2001, it had become a concern for mainstream security experts, when a thread on the info-sharing mailing list BugTraq drew the attention of both security researchers and the popular tech site The Register.
Soon after, the most famous example of a Zip bomb, 42.zip, emerged—its author to this day is unknown, but its 106 billion-to-one compression ratio is the stuff of legend. In an email interview, Fifield noted that, while 42.zip (which has been hosted on the same single-serving website for at least 15 years) gets much of the attention, he finds later attempts more technically interesting.
“I find 42.zip inspiring on an aesthetic level—not so much the file itself but the circumstances around it,” Fifield said. “It’s like folklore. There must have been many examples of the same basic idea, but for whatever reason 42.zip is the one that had staying power.”
Extreme Compression
The Zip bomb has, over the years, remained something of a malware novelty, one that pops up from time to time in different contexts. My favorite example is from 2017, when an IT blogger, sick of all the bots that were trying to log into his WordPress site, created a gzip-based Zip bomb variant that took advantage of the built-in compression in many web browsers to break the bots that were trying to get into his site.
What makes these files so effective is that they take advantage of the very things that make a compression effective and weaponize them—wasting CPU cycles, RAM, and disk space in the process.
But these files had a natural limitation: Most Zip decompression routines max out at a compression ratio of 1032-to-one, which meant that “Zip bombs” could only reach their true compression potential recursively. For example, 42.zip, just slightly larger than 42 kilobytes when fully compressed, contains increasingly complex Zip files, layered in batches of 16 in multiple layers of a single archive. When fully decompressed, the documents take up 4.5 petabytes—a rate so large that it would tax the highest-end server you could buy.
Later compression experiments, called Zip quines, allowed for infinite recursion, so you could basically keep unzipping and never reach the bottom. (Fifield said this format helped inspire his work.)
But what makes Fifield’s work interesting is that it blows past the 1032-to-one compression ratio by overlapping files on top of one another during the compression process, making for a more tightly packed file in a single layer, with no recursion. While his results are less compressed than 42.zip is, with an equivalent-sized file only reaching a somewhat modest 5.5 gigabytes when decompressed, the compression rate is much more significant at larger sizes—as large as 281 terabytes with a traditional 10-megabyte Zip file.
Moving beyond the original Zip spec produces even more impressive results: A 45-megabyte file, compressed using the somewhat less-common but more efficient Zip64 extension of the format, can reach an equivalent decompressed size to 42.zip without any recursion, and at a ratio of 98 million to one—not as tightly wound, sure, but certainly still enough to give your computer fits.
Fifield noted that part of what makes his process possible was by coming up with ways to handle cyclic redundancy checks, or CRCs, a basic error-correction functionality baked into Zip, PNG, Ethernet, and numerous other technical standards. Messing around with CRC–32 checksums, as they’re called, was where Fifield said he learned the most.
“If you dig into the source code commit history, you’ll see that some of the earliest commits are just me struggling with checksums,” he said in the email.
Are They Actually Dangerous?
Now, to be fair, it’s unlikely that you’re going to run into a Zip bomb in the wild, and even if you do, many modern compression and antivirus programs can correctly detect that they’re problematic. It’s not exactly tricky to defeat a Zip bomb, either. “Detecting the specific class of Zip bomb we have developed in this article is easy: just look for overlapping files,” Fifield wrote in his article.
But since such detection doesn’t exist in decompression tools at the moment, it created an opportunity for testing, and even then, the worst it did with many programs tested by Fifield and others (including LibreOffice and McAfee Antivirus) was cause them to timeout. As pointed out by Twitter user Tavis Ormandy, a few antivirus programs even correctly detected it was a Zip bomb, despite the fact that it’s a new kind of Zip bomb.
Fifield, who will present his findings at the USENIX Workshop on Offensive Technologies (WOOT) conference next month, noted that while the work itself adds to a history of research and likely will be superseded in the future, its benefit from an awareness standpoint is important.
“I hope that one of the benefits is more awareness among developers of the hazards of processing complex archive formats like Zip,” he added. “It helps to have some concrete outputs: code reviewers, customers, and users will at least be able to point to this research and ask, does it handle this correctly?”