Tech

The Motherboard Guide to Steganography

18 - futuresteg

In April, Motherboard founder and former Publisher Thobey Campion debuted a missing page from a formerly classified government investigation into the feasibility of astral projection. During his reporting, he received thousands of emails from astral travelers around the world. He digitally micro-inscribed hundreds of these messages onto a high-resolution digital image of The Gateway Report’s missing page. 

Today, it is being released as an NFT on SuperRare alongside a powerful new digital magnification tool. Developed in partnership with SuperRare and Gigapixel, the new functionality reveals hundreds of hidden details inside ‘The Gateway,’ making it the industry’s first lossless zoom experience. 

Videos by VICE

1 - exploded view.jpeg
A partially-exploded view of The Gateway. Graphic: Thobey Campion

While NFTs provide a picture-perfect moment to push the bounds of metadata, hiding information in plain sight goes way back, and it’s got a name: Steganography. In honor of that tradition, here’s a tour through the intentionally unseeable world of hiding information inside imagery.

—————————————————————————

It was Zheng, in the Library, with the Sunset

At 8:42am on an overcast August morning in 2018, Principal Engineer at GE Xiaoqing Zheng was marched out of his home in Eastern New York by the FBI. The event was a hot topic for the sleepy town of Niskayuna, just north of Albany. His LinkedIn includes a degree from the Sloan School of Management, a PHD in Aeronautical Engineering from Northwestern, 13 years as a senior technical leader at one of America’s most venerable technology companies, and 29 patents under his belt. 

Zheng had just been charged on 14 counts of conspiring to steal General Electric’s trade secrets. According to the complaint, he’d allegedly been exfiltrating GE intellectual property to Tianyi Aviation Technology Co. in Nanjing, a company the feds accused him of starting with funding from the Chinese government.

Zhang and FBI.jpg
Composition: Thobey Campion

Over the course of his employment at GE Power & Water in Schenectady, Zheng allegedly stole 20,000 electronic files detailing design models, engineering drawings, configuration files, and material specifications for gas and steam turbines. According to the complaint, Zheng transported the privileged data off of GE’s highly-fortified corporate servers in a manner considered unusual even for the most seasoned Information Security specialists. He concealed it all inside a low-res picture of a sunset. He emailed the image to his personal email account with a note: “Nice view to keep.”

3 - Capture.png
Zheng’s original image (left) and his image containing GE trade secrets (right). Art: Xiaoqing Zheng

By allegedly hiding the illicit information inside his sunset, Zheng was partaking in an advanced form of a tradition that’s been alive and well for at least two millennia: Steganography.

It’s a Whole Thing and Its Own Thing

The word Steganography comes from the Greek steganós, meaning “covered or concealed,” and graphia meaning “writing.”

In the family of covert communications, steganography has a more well-known sibling in cryptography. Distinguishing the two from the drop is key. Cryptography is the science of writing in secret code (encrypting information), while steganography is concerned with making information invisible entirely, or hiding it in plain sight.

In practice, cryptographers scramble data between point of departure and destination to keep it safe. The intended recipients then receive a key that turns unrecognizable ciphers into intelligible data. No key, no dice. 

4 - Cryptography.png
Chart: Thobey Campion

Encrypted information sticks out. It is recognizable by its very randomness. Steganography does not attract attention to itself as an object of scrutiny. So, while cryptography renders data unreadable, steganography hides the presence of the information altogether. It’s the Milford Man of communications. 

Staying true to form, the world of steganography is jam-packed with terms that discourage participation. So I produced this A-Z of Steganography to act as a key word chaperone as we descend into the shadowy depths of hidden data. A high-resolution PDF download is available for paying subscribers here, along with a compendium of groundbreaking steganography papers. 

4.5 - AZ Steg.jpeg
Chart: Thobey Campion

The Decidedly Analog Origins of Steganography

Steganography precedes our modern digital world by a solid 2,500 years. From its beginnings onward, the goal has remained the same: Enable one party to communicate with another party without a third party being aware that communication is even occurring.

Here’s a quick rip through the history of physical steganography. 

5 - Stegahistory.png
Nazcan Geoglyhs; Trithemius’ ‘Steganographia’; DaVinci’s ‘Mona Lisa’; Michelangelo’s ‘David and Goliath’; Bosch’s ‘Garden of Earthly Delights”; Microdots; ‘The Casdagli Sampler’; The Digit Affair. Art by Thobey Campion. Composition: Thobey Campion.

440 BC – The Greek ruler Histiaeus shaved the head of a servant, tattooed his scalp with a secret message, let his hair grow back, then sent him through enemy territory with an instruction – “When thou art come to Miletus, bid Aristagoras shave thy head, and look thereon.”

200 BC – The geoglyphs of the Nazca contain 143 topographical line drawings that went undetected until the 1920s when they were viewed from the air by Peruvian military and civilian pilots. While the meaning of the geoglyphs remains a mystery, researchers are making inroads with the aid of machine learning

1499 – German Benedictine abbot, cryptographer and occultist Johannes Trithemius published Steganographia, a book purporting to be about magic. A decryption key eventually revealed the first two volumes to actually be about cryptography and steganography.

1503 – According to a recent finding by Italian researcher Silvano Vinceti, in his Mona Lisa, Leonardo DaVinci microscopically painted an “L” over the right eye of his subject, an “S” over the left eye and a “72” under an arched bridge in the backdrop.

1509 – A panel on Michelangelo’s contribution to the Sistine Chapel portrays David defeating Goliath. David’s stance is in the shape of the Hebrew letter “gimel.” This letter refers to reward and punishment, paralleling nicely with the underdog story.

1510 – Hieronymus Bosch’s triptych “Garden of Earthly Delights” contains a musical score on a figure’s hindquarters (excuse my language). The notation was discovered in 2014 by a college student and piano teacher, who after playing it, described the music as “LITERALLY the 600-year-old butt song from hell.”

1887 – Two Dutch scientists used a high-intensity X-ray from a particle accelerator to scan Van Gogh’s “Patch of Grass” revealing a hidden portrait of a woman.

1930 – During the Franco-Prussian war, Parisian photographer René Dagron used a photographic shrinking technique to increase the number of messages a pigeon could carry. This shrunken image became known as a Microdot. Post-WW1 German spies adopted the technique of shrinking messages up to 200x. Microdots as small as a typewriter’s period were then covered by reflective adhesives (for findability) or inserted into the side of postcards.

1941 – British army officer Major Alexis Casdagli was captured and imprisoned by the German forces in the early days of WW2. To pass the time he began stitching what would become known as The Casdagli Sampler. Along the border of the embroidered canvas, which features German swastikas, American eagles, British lions and Soviet hammer and sickles, are a series of irregular lines and dots. It is Morse code for “Fuck Hitler.”

1968 – The crew of the USS Pueblo were captured by North Koreans and brutally tortured. In propaganda photos aimed at showing how well the captives were being treated, the POWs snuck in their middle fingers in what became known as The Digit Affair.

Then computers showed up and the decidedly subversive art of hiding imagery in plain sight got a whole lot more sophisticated. 

6 - Cheetah.png
Art: Thobey Campion

Digital Steganography

The personal computer was a lightning rod for new approaches to steganography. The sheer amount of data inside one image offered practitioners an expansive shadow in which to sneak around. Pioneers tried all sorts of approaches: blog-steganography, sneaking data into file info, hiding messages inside file folder systems, and even strategically delaying packets of information that made up an image, the sequence of which formed a message unto itself.

One of the first high-profile applications of digital steganography occurred in 1993. The laptop of suspected World Trade Center and Philippine Airlines Flight 434 bomber Ramzi Youseff was seized. Buried deep in the hard drive were a collection of encrypted files. After their discovery, it took the FBI over a year to decipher two the hidden information. They contained plans for future attacks, flight schedules, projected detonation times, and chemical formulae.

Osama Bin Laden was also an avid practitioner of steganography. USA Today ran a story in 2001 reporting that Bin Laden and his associates had been hiding information crucial to terrorist plots in images on pornographic websites as early as 1996. 

Steganography has been used by criminals and terrorists, but like any form of cryptography, it’s used by a wide array of people. Steganography is used by intelligence agencies, artists, activists, hackers, coders, puzzle makers, and more. As steganographic techniques evolved, so did methods for evaluating their impenetrability. To scrutinize a work, the steganography community developed a risk matrix of sorts, calling it the ‘trade-off tetrahedron.’

The four points represent the key requirements for incognito payload transmission. The ‘trade-off’ implies that favoring one of the requirements always compromises the other three in some way.

  1. Robustness -The characteristics of a hidden message that ensure it survives any non-malicious data processing while in transit from sender to recipient.
  2. Security– The built-in protection against a hostile entity’s attempts to remove or disable the hidden message.
  3. Imperceptibility – A hidden message’s ability to exist while not degrading the perceptual content of the host.
  4. Capacity – The amount of information that can be hidden inside the cover.

Of all the digital steganography techniques that proliferated in the early days, one fared particularly well against the trade-off tetrahedron. 

Least Significant Bit Substitution (LSB)

The mid-1980s gave rise to a technique called Least Significant Bit Substitution. LSB hides data inside the information least important to the quality of an image, thereby minimizing image distortion. This information-redundant region of an image consists of the least significant bits of each pixel. 

LSB is often cited as the steganography technique of choice for Middle East terrorist groups. In 2012, suspected Al Qaeda member Masqsood Lodin was apprehended in Germany after a trip to Pakistan. Berlin police seized memory cards and a flash drive from Lodin’s underwear. The drives appeared to contain two pornographic files, titled “Kick Ass” and “Sexy Tanja.” After weeks of laborious scrutiny, German investigators found buried in the data more than 100 Al Qaeda documents. It included plots to seize a cruise ship, a document called “Future Works” outlining attacks throughout Europe, and terrorist training manuals in German, English and Arabic. 

LSB can be performed in 5 arduous steps:

1) Select a host image. Note: Try to find a 1/1 original you created ;-) Using a publicly available photo exposes your file to the risk of being compared to an unaltered available copy, differences being spotted and ultimately compromising the safety of the secret data.

2) Convert the host image to a bit array. That’s the binary code (1s and 0s) that inform the color of the pixels that make up the digital image. Here’s a simple online tool.

3) Locate each binary sequence’s least significant bit. This is the right-most bit at the end of each binary sequence. Below is the binary for a pixel that reads as the color red. 

8 - LSB.png
Image: Thobey Campion

The circled bit above has the least impact on the color of the pixel the sequence instructs. Aka, if you change this number to 0, the pixel will still look almost exactly as red as it did before the alteration.

To give you an idea how trivial that bit of information on the right is – as you move one-by-one along the bits to the left, they double in value. So in an image’s 8-bit binary sequence the left-most bit is 128x more impactful on what gets delivered to the human eye than the right-most. And that’s just for a single pixel. If you change that last bit you’re changing 1/245th of the pixel. No one’s noticing anything. 

9 - Bit Values.png
Image: Thobey Campion

4) Make minor alterations to each least significant bit, so that when each bit is isolated from the sequence and re-compiled they form a new meaning. The example below starts with 3 pixels’-worth of binary data from an image, isolates the least significant bits, and flips 4 of them, so that the least significants together form a new sequence that then spells the letter “e” in binary. The message is now hidden inside the “stego image.” 

10 - Buster.png
Image: Thobey Campion

5) Transmit the stego image to a recipient, who can use an algorithm to detect anomalous changes in pixel value and then extract the hidden message.

Outside of being a total nuisance, Least Significant Bit Substitution has some serious limitations. Chief among them is the LSB method puts a ceiling on how many pixels can be altered and therefore how much data can be hidden. Packing too much hidden data inside pixels eventually causes image degradation. This can be caught by the human eye or the trip wires of forensics software. To remain under the radar, steganographers using the LSB method can only manipulate 15 percent of an image’s content.

Then, along came researchers and high-performance steganographers Richard Eason and Eiji Kawaguchi. In a paper titled Principles and Applications of BPCS-Steganography, the two illustrated how to jack hiding capacity to a staggering 60 percent, meaning 60 percent of the image’s data could be changed behind the scenes without it significantly degrading image quality.

11 - Eason + Eiji.png
An original image (left) and bit-plane complexity segmented image (right) containing a photo of Abraham Lincoln, and the textual contents of The Gettysburg Address, The Declaration of Independence, The Constitution, The Magna Carta, Romeo and Juliet, Hamlet, Macbeth, A Midsummer Night’s Dream, The Taming of the Shrew, The Tempest and A Comedy of Errors. Steganography: Eason & Kawaguchi.

Bit-Plane Complexity Segmentation (BPCS)

To raise the bar on the Least Significant Bit technique, Eason and Kawaguchi manipulated the shortcomings of human vision. To maximize perceptual imperceptibility, they devised a method of splitting an image into two distinct regions—an ‘informative region’ and a ‘noise-like region.’ An informative region consists of a simple pattern that the eye can recognize instantaneously; a noise-like region consists of a pattern so complex our brains simply don’t register it.

Imagine you are arriving to a day at the beach. There’s sand as far as the eye can see. This stretch made up of billions of grains is a noise-like region. It’s so many objects with so much sameness your eyes can’t really distinguish one from the other. You can however cross-reference the khaki colored shape that traces the ocean with your memory and be virtually guaranteed that you are indeed at a beach. So, assured, you sit down on a patch of noise-like sand. As you get settled a pink flamingo strolls by your spot. Its pink plumage jumps out from the color of the beach, its impossibly thin stick legs bend recognizably. This is an informative region!

Every image contains informative and noisy regions, which become mathematically mappable at a microscopic scale. Here’s what it looks like up-close.

11.5 - flamingo v sand.png
Image: Thobey Campion

To map these regions, steganographers slice images into 8 planes.

12 - Bit Planes.png
Image: Thobey Campion

That stack of slices looks like this when laid down next to each other.

13 - Bit Planes 2.png
Image: Thobey Campion

Now broken down into bits, it’s easy to identify informative and noisy regions. To make it even easier, a grid is placed over the image and a noise value is attributed to each cell. This is achieved by measuring the amount of pixel contrast inside each segment. This is also known as an image’s border value. Here are some examples. 

14 - 3 figures.png
Image: Thobey Campion

In Figure 1, a white pixel is surrounded by 4 black pixels. That gives it a border value of 4 because there are four contrasting borders. In Figure 2, there are no black pixels bumping up against white pixels. This gives it a border value of 0. Figure 3 has a border value of 12. AKA, Figure 2 is informative, whereas Figure 3 is bringing the NOISE.

Eason and Eiji proposed embedding secret data into imagery by matching their noisy regions, which usually consist of 45 percent of the image. That’s the easy part.

To camouflage the informative region of a secret image into a noisy region, it needs to be converted into noise. At the pixel level, that conjugation process looks like this. 

15 - Noise.png
Image: Thobey Campion

In the above, (1) is a piece of an initial image. It sticks out like a sore thumb. By matching it with the host image’s foreground pattern (4) and background pattern (5), it can nestle in undetected.

Here’s an example in action!

16 - Ape Woman.png
Steganography: Eason & Kawaguchi.

The carrier is converted to grayscale. Both images are then plane sliced. A complexity measure is calculated on both blocks for both images. A conjugation is performed on the ‘simple’ or ‘informative’ blocks of the secret image, turning them into noise. And finally, the secret image is then embedded into the carrier image to form a final stego image.

To perform this delicate act you need a finicky software package like Matlab and a tremendous amount of patience. It’s all a lot.

Luckily there are off the shelf tools to do the heavy lifting for you. 

17 - Stegbro.png
Art: Thobey Campion

You Even Steg, Bro?

A simple Google search returns a mountain of turnkey software out there for the data-cloaking enthusiast. I tried them all. Here are the top three, ranked by ease-of-use, versatility, and whether they flooded my computer with bots. Bonus points for Mac-friendliness, cause that’s where I’m at.

• Steghide: For usability, Steghide shines. It’s also a super-lightweight piece of software. So if you’re trying to hide the very fact that you’re stegging in the first place, there’s an added benefit.

• Xiao Steganography: If you’ve got a PC, this is your best bet. The community generally agrees. An extra security feature enables users to encrypt data before embedding it into the image for double the protection.

Open Puff: While this piece has remained solely focused on visual steganography, you can conceal data in all types of carrier media files. Open Puff can trojan horse hidden data in a wide array of imagery, audio, video and flash-files.

• Metasynth: On the audio front, don’t sleep on Metasynth. This is the software Aphex Twin used to drop images into his music. A story for another day though.

18 - futuresteg.jpeg
Art: Thobey Campion

Steganography’s Next Dimension

‘Obfuscation layer’ is a prominent term in the digital forensics lexicon. These layers offer additional levels of protection from attackers. The more layers, the less perceivable and therefore more safe a hidden message. To peer into the future of steganography is to gaze through an ever-growing and increasingly opaque stack of obfuscation layers.

Mind-blowing work is being done to advance the field by re-distributing hidden messages inside the 2-dimensional plane of flat imagery. In 2013, American computer scientist Chuck Easttom filed a patent for processing a data message into a plurality of message blocks each representing a portion of the original. These fragments are in turn inserted into a plurality of images. Zhongliang Yang, a Tsinghua University Doctor of Engineering (and total beast, I might add) has built a model for enhanced cognitive-imperceptibility using neural networks. A 2021 paper outlined a method for mimicking the patterns of human chromosomal sequences to build a ‘DNA algorithm’ that dictates the dispersal of altered bits around an image, thereby increasing a hidden message’s undetectability. 

It is however one dimension beyond flat imagery that poses the most possibility for steganography’s future. Up until a few years ago, techniques mostly focused on embedding data by modifying pixel values. With the increase in accessibility of 3-D hardware for CAD, Virtual Reality, video games and 3D printing, the steganography conversation began tilting towards the usage of 3-D geometry as the host object.

A report funded by The Natural Science Foundation of China has become somewhat of a primer on 3-D steganography. It opens with a promise that “the flexible data structure of 3-D geometry may provide ample room for hosting secret information.”

All digital 3-D objects are comprised of a series of mesh faces, 2-D polygonal shapes that together approximate the shape of an IRL object. The most common polygonal shape used in 3-D objects is the triangle. It’s not uncommon for video game designers to flex the number of triangles in their compositions. Here are some rudimentary 3-D objects composed of triangle mesh. 

19 - 3d stegs.png
Image: Thobey Campion

Every 3-D object contains a group of rearrangeable vertices and triangles that can be manipulated without changing the geometry of the visual output. 

20 - trianglestegs.png
Image: Thobey Campion

The above spotlights an extension domain. This is the area in which a hidden message can be stowed away without being noticed. In (A), the height of the domain is established by extending the vertex from v(i) up to v’(i). In (B), the rotation of the domain is then established as a right angle at the base of the vertex. A hidden message can be embedded in this range. Meanwhile, there is no seeable difference in the triangle. 

21 - trianglestegs2.png
Image: Thobey Campion

In (A), v(j) – v (j) – v(k) represents a visible triangle, whereas v(j) – v’(i) – v(k) represents the exit edges that don’t register to the human eye. These are then plotted out along a mesh in (B). Each face represents one bit’s worth of space for a hidden message.

Zooming out, the standard 3-D mesh human figure in say, a video game, contains somewhere in the neighborhood of 150,000 triangles. That puts the triangle count of an entire scene north of 1 million and a count of a billion in an entire game – extraordinary depths in which to hide mountains of data.

While these developments mark a major milestone for embedding capacities, they are merely the beginning of a whole next timeline and range of applications for steganography. Hidden messages can now be contemplated inside 3-D printed material, recording devices inside medical implants and even our own biological systems.

I had a chance to talk with cybersecurity expert and steganography patent-holder Chuck Easttom. We talked about the future of Steganography. “The bottom line is if you really want to hide something, the biggest determining factor on whether or not you’re going to be able to hide it is the ratio between what you’re hiding and what you’re hiding it in,” Chuck said. Forecasting the future, Easttom surmised “any time there is a new venue for any type of communication I would expect someone to eventually utilize that to hide data.”

Steganography has always formed a symbiosis with the ways in which we choose to communicate. As we spend more of our time, sharing via increasingly complex visual modes, the room for hiding data will only increase. For many in the Information Security industry this poses a concern. For other industries though, this innovation promises a quantum leap in range of expression.

What a time to be disguised. 

Thobey Campion is the former Publisher of Motherboard. You can subscribe to his Substack here.