In a recent YouTube video, Joe Rogan says he wants to form a chimp hockey team, claims that he’s trapped in a robot, and muses over life being a simulation. The voice isn’t actually the host of The Joe Rogan Experience, though. It’s really an AI-generated voice that sounds uncannily like the comedian and podcaster, Jersey accent and all.
According to the video description, machine learning engineers at AI startup Dessa created a computer-generated dupe of Rogan’s voice using audio of him speaking. Machine learning tools learn patterns—like the particulars of a person’s voice—in large amounts of data. This makes sense for Rogan, who releases several episodes of his podcast every week, each of which may run two hours long or more.
Videos by VICE
In a blog post about the work, the company writes that the engineers—Hashiam Kadhim, Joe Palermo, and Rayhane Mama—generated the speech with text inputs, and that this approach “would be capable of producing a replica of anyone’s voice, provided that sufficient data is available.”
A spokesperson for Dessa told me in an email that it’s trained on audio data from Rogan, and “what you are hearing is the result of a text only input.”
This is similar to how the algorithm that made Jordan Peterson rap like Eminem works—that approach required at least six hours of audio to train on. Programs like Lyrebird and Modulate.ai also need a little audio to train the algorithm, even if just a few minutes, to replicate a real voice accurately.
Even so, the quality of Dessa’s Rogan dupe stands out from the crowd. . In the announcement blog, the Dessa researchers say they won’t release details about how the algorithm works, including any models, source code or datasets, at this time. But they promise to post a “technical overview” within the coming days.
Since the advent of deepfakes—algorithmically-generated face swap videos—we’ve seen more startups trying to make their own fake personas while simultaneously emphasizing the societal implications that these fakes could have. Instead of releasing the details of the project, Dessa outlined some of the consequences of ultra-realistic audio, including scammers and spam calls, harassment, and impersonating a politician. On the plus side, accurate fakes could improve accessibility tech and language translation, they wrote.
In case you think you’re not easily duped by fakes, the researchers set up a quiz to test how well you can discern the real Rogan from fake Rogan. It’s shocking how close they match, the only differences being slight inflections in the voices. I guessed correctly on seven out of eight of the examples, and the one I flubbed will haunt me.