Tech

How You’re Unknowingly Translating Books Online for Google

I don’t think I’m the only person that had no idea that every time I entered some words into a reCaptcha box — you know, verifying that I’m not a computer — that I was also, quite directly, helping to translate a book. I don’t particularly care that it’s free labor for a tax-dodging Google, which owns reCaptcha; this is a cool project.

If you’re drawing a blank here, reCaptcha is this thing:

Videos by VICE

Basically, computers can’t handle reading distorted text, at least not as well as people. The box below is about the current state of Optical Character Recognition tech. (Please to note techo-optimists, this is where computing is in the real world.)

So, you decipher some text in a Captcha box and your cleared as a human. Humans solve about 200 million Captcha problems daily, at about ten seconds of brain power each. There’s other website plug-ins that use Captcha technology — like this one that matches words to images — but reCaptcha seems to be owning the market.

Here’s how the translation part works. You’re given two words, right? One of those words serves as a sort of “control.” It’s a word that reCaptcha came up with, and knows the answer to. The other is from Google’s translation program, either something from an ancient New York Times or a book headed for Google Books. Computers scan those things, but every now and again get stumped on a word they can’t decipher. That word, as it appears in its source, gets fed to reCaptcha boxes on websites, making use of all that spent brain power.

So, the computer knows the answer to the one “control” word, and if you get that one correct, it assumes you have the one it doesn’t know correct. Which is pretty flimsy, sure, but each mystery word goes through a number of different human brains — with different answers accumulating points per identification — until it reaches a level of acceptable certainty. And it is thus translated.

Connections:

Reach this writer at michaelb@motherboard.tv.