Tech

How File Sharing Broke the Internet’s First Forum

As I pointed out a while back, people love free stuff when it comes to the internet.

Beyond communication, it may be the most popular use of the internet around. But what happens when a platform intended for communication becomes overwhelmed by free stuff?

Videos by VICE

To put it simply, that’s what Usenet is. The protocol, with a history that goes back nearly 40 years, evolved from its arcane roots as a place for technically minded college students and sysops to communicate into a newbie-overrun protocol, then into a major hub for file sharing.

Why did Usenet see its digital role shift so dramatically over time? Let’s discuss.

Image: Alexandre Dulaunoy/Flickr

The fundamental technology that eventually changed Usenet’s fortunes forever

In 1979, two Duke University students came up with the basic idea of “netnews,” the service that became Usenet after an array of improvements.

Their protocol, originally named A-News before a series of improvements, took advantage of the Unix-to-Unix Copy Protocol (UUCP), a distributed way of copying files between computers that was built alongside the ARPANET, the network that became known as the internet. Eventually, it became compatible with it. The developers, Tom Truscott and Jim Ellis, made the software available to any Unix host, and within a few years, the protocol became one of the most popular ways to communicate through the still-formative internet.

Usenet’s creation was based around the idea that computers were becoming sophisticated enough that they could be used to hold conversations, and there was plenty of conversation going on. It was basically Reddit, except decentralized and without a true owner.

But the protocol’s roots in UUCP—effectively a peer-to-peer file-sharing mechanism when you break it down—meant that it was also an effective way of sharing files. Usenet was designed to only share text, but programmers continued to improve on the technology. One particularly pivotal figure was Mary Ann Horton, a University of California Berkeley graduate student who was involved in building up UUCP’s early protocols, and helped create a connection between the protocol and the broader internet.

(Of note: Eric Schmidt, the longtime chairman of Google and a Berkeley grad himself, was also involved in these early efforts to burnish UUCP, helping Horton and another student, Eric Allman, build a gateway that connected UUCP and ARPANET.)

Horton, known as Mark at the time, was deeply impressed with what became Usenet.

“I remember, while at Berkeley, exchanging email with the original A-News developers and being amazed that I could get a reply back a few hours later, even though ‘research’ was polling both ‘duke’ and ‘ucbvax’ to pick up waiting mail,” Horton wrote of her early experiences of using the technology.

Using her own skills, Horton helped to further shape Usenet and helped improve on the work of Truscott and Ellis. Also around this time, she was responsible for creating Uuencode, a piece of software that would become key to the legacies of both email and Usenet. She also created a whole lot of other things, a list of which is here.

(Side note: Beyond her technical achievements, Horton deserves notice for her efforts in speaking up for the rights of transgender people in the workplace, which started when she was at Lucent in the 1990s. She helped set the stage for some major changes in the corporate world, particularly in Silicon Valley.)

Functionally similar to the .zip file format in some ways, Uuencode effectively worked as a bridge between binary files and raw text. If you ran a Uuencode command on a binary file, it would turn it into a jumble of text. Another user could run a Uudecode command to turn the file back into a binary format.

This came in handy because it allowed for email attachments, as well as to distribute binaries through what became Usenet.

Now, you might be wondering how this all works. To explain, here’s an image of a cat preparing to eat a dandelion:

As shown above, the image is directly embedded into the article as text, using an encoding approach called Base64, which is a descendant of sorts to Uuencode. The file is effectively a long block of text that’s too long to share in its raw form, but one section of it looks like this:

Your web browser is smart enough to decode these blocks of gibberish, allowing it to recognize that the text is actually a binary image. It takes commands that wouldn’t generally be human-readable and makes them work in that system. (The fact that the text is gibberish, by the way, makes Base64 a somewhat common vector of attacks on the web; for example, it was used in the recent malware infiltration of the popular CCleaner utility software.)

Now, Uuencode wasn’t perfect—the text encoding was somewhat wasteful, adding overhead and making the encoded text files more complicated than they needed to be, and the idea has been improved upon since. But it was an effective way of allowing for the transfer of files far and wide. It was particularly great for the distribution of files through Usenet, as the encoded files would transfer from waystation to waystation.

The approach, both in terms of infrastructure and use case, is very reminiscent of what BitTorrent became. And over time, this use case would come to overwhelm the Usenet protocol.

Not that it was ever the intention.

“Because not all Internet service providers are as responsible as AOL, and because individual acts of online piracy continue, I am glad to have called attention to the problem of online piracy through this litigation.”

— Author Harlan Ellison, in a 2004 joint statement with AOL announcing a settlement over copyright infringement claims. Ellison sued the company, in an early test of the Digital Millennium Copyright Act’s safe harbor provisions, because a user shared one of his books in an electronic format over Usenet, and that book was accessible via AOL. The ongoing suit, a favorite of TechDirt, was largely found in favor of AOL, though the Ninth Circuit Court of Appeals didn’t fully clear the dial-up icon, leading to the settlement. Ellison’s lawsuit was just one of many problems for Usenet.

The target on Usenet’s dynamically distributed back

“Newsgroups,” the preferred nomenclature for a Usenet group, always was a misnomer.

Sure, there was some news and legitimate information being shared—for example, teenage chartster Matt Levine, who I wrote about a while back, built his following by sharing charts via Usenet for years. But the large bulk of what was being shared, bit by bit, was encoded binary files.

And, due to its design and status as a pre-web way of communicating, Usenet had a sense of lawlessness to it. It was a place where any kind of information could be found, if you were willing to look for it.

There was no filter. It was all or nothing. Like Reddit, most of the conversation was innocuous, but there were some places where it very much wasn’t.

And with binaries a part of the equation, that meant copyright theft and illicit content were never too far away.

When President Bill Clinton signed the Communications Decency Act of 1996—which was largely thrown out by the Supreme Court just a year later—Usenet had much of the indecent material that the bill’s authors had in mind.

And it was difficult to rein in, because Usenet wasn’t really built to be reined in. The complaints lobbed at Twitter about moderation today were triply true about Usenet—because, in most newsgroups, moderation simply didn’t exist.

“Regulating the Usenet is a problematic venture since often there are no identifiable agents toward whom to direct regulatory actions,” author Blake T. Bilstad wrote in “Obscenity and Indecency on the Usenet: The Legal And Political Future of Alt.Sex.Stories,” an article in the Journal of Computer-Mediated Communication. “There are no Sysops, very few newsgroup managers, and a slew of private individuals using anonymous (and virtually untraceable) IDs and usernames.”

That meant the pressure of reining in illicit material fell upon internet service providers. Even with the eventual existence of the Digital Millennium Copyright Act and its safe-harbor provisions, binaries created massive liabilities for internet providers, because of the often too-hot-to-handle nature of the binaries shared. That pressure in some ways was just as much legal as it was social.

Groups dedicated to distributed binaries, generally under the “alt.binaries” newsgroup hierarchy, also had a more practical effect. They turned what was intended to be a protocol for simple communication into one where articles were very large, which made them harder to store. And as a result, this limited the commercial feasibility of the Usenet platform.

(AOL, for one, had shut off its access by 2005, citing the waning popularity of newsgroups compared to web-based alternatives like blogs. It didn’t cite the influence of Harlan Ellison, though one might assume it played a role.)

So when pressure campaigns to get major ISPs to shut down access to portions of Usenet heated up—particularly when New York’s then-attorney general, Andrew Cuomo, launched a crusade against child pornography around 2008—major ISPs were more than happy to oblige.

Usenet represented a lot of legitimate conversation, just like every other part of the internet. However, the ugly parts, while small (Cuomo’s staff pinpointed 88 problem groups, out of roughly 100,000 total at the time), were really apparent, and the binaries made things worse. An ACLU official, at the time, compared it to “taking a sledgehammer to an ant.”

And Usenet’s decentralized design made it difficult to filter out the ugly stuff. (As well as spam. Oh man, Usenet was famous for spam.)

The FBI couldn’t shut down a Usenet group that was illegally sharing episodes of Arrested Development or copies of Windows XP, like it has with websites in the past.

But ISPs, on the other hand, could simply decide not to carry newsgroups. Thousands of legitimate newsgroups, many of which didn’t have explicit content or engage in copyright infringement, were caught in the middle.

These changes didn’t kill Usenet, of course; it was a hard-to-ignore wound, however. Smaller hosts, like GigaNews and NewsDemon, carried the mantle, as did Google Groups, which carries many newsgroups to this day. (Eric Schmidt has influence on both ends of this story, as it turns out.)

Usenet once had immense cultural value—and still does in some corners, for some specific uses, particularly from a historical context. But no limits has its limits.

“Obviously, this is not your typical method of cloud storage. Many may snicker or find it plain weird to tap into this vintage part of the Internet in such a modern way. But oddly enough, we have experience experimenting with this alternative offline storage when it comes to backing up a Linux system. And backing up your Linux system to Usenet ultimately requires only a handful of steps. It’s not only possible—it’s scriptable, too.”

— Brian Hill, a contributor to Ars Technica , discussing how one could use Linux to back up a machine using a Usenet binary group as a form of cloud storage. He notes that it’s not only effective, it’s redundant—because the data goes so many places, because it’s possible to encrypt the files, and because transferring data to Usenet is so cheap, it’s actually an effective option for archiving data if you’re willing to get your feet wet. (It takes advantage of the fact that most modern-day Usenet hosts don’t throw things out for a number of years.)

In many ways, Usenet represents the gravel road on an internet that’s otherwise been mostly paved, cleaned up, maintained, and even renovated.

You won’t end up there by accident these days, like you might on Facebook. It’s an intentional visit, one that has to be planned out with specific goals in mind. Discussion threads exist in certain corners, but you’re often just as likely to see a forum filled with spam or a person posting a message asking if anyone else is still reading Usenet. Perhaps, without stewards or moderators, this was always bound to happen.

Usenet hosts are still with us, but they specifically cater to the technology’s status as a way of accessing binary files—something that was a hack, really, but came to define the network.

That’s because it actually carries some major advantages in certain contexts. If a Usenet host is fast, downloads can actually be significantly faster than through BitTorrent, because you’re not waiting for someone else to carry the file you want. Certainly, it’s not perfect—warnings are prevalent about malware and viruses that might be baked into those giant blocks of text that represent binaries on Usenet—but it’s effective.

But, as helpful as that might be for people who want that, it certainly wasn’t what Usenet was built for.