Why An AI-Judged Beauty Contest Picked Nearly All White Winners

Beauty pageants have always been political. After all, what speaks more strongly to how we see each other than which physical traits we reward as beautiful, and which we code as ugly? It wasn’t until 1983 that the Miss America competition crowned a black woman as the most beautiful woman in the country.

So what if we replaced human judges with machines? A robot would ideally lack a human’s often harmful social biases. As shallow as the whole thing is, would a computer at least be able to see past skin colour and look at, potentially, more universal markers of attractiveness? Or hell, even appreciate a little melanin? Not really, as it turns out.

Videos by VICE

Beauty.ai, an initiative by the Russia and Hong Kong-based Youth Laboratories and supported by Microsoft and Nvidia, ran a beauty contest with 600,000 entrants, who sent in selfies from around the world—India, China, all over Africa, and the US. They let a set of three algorithms judge them based on their face’s symmetry, their wrinkles, and how young or old they looked for their age. The algorithms did not evaluate skin color.

The results, released in August, were shocking: Out of the 44 people that the algorithms judged to be the most “attractive,” all of the finalists were white except for six who were Asian. Only one finalist had visibly dark skin.

How the hell did this happen?

The first thing to know is that all three algorithms used a style of machine learning called “deep learning.” In deep learning, an algorithm is “trained” on a set of pre-labeled images so that when presented with a new image, it can predict with a degree of certainty what it’s looking at. In the case of Beauty.ai, all the algorithms were trained on open source machine learning databases that are shared between researchers.

Deep learning is the most powerful form of machine intelligence we have, and is used by massive companies like Alphabet and Facebook. However, some recent work has discovered that these systems can harbor all kinds of unexpected—and very human—biases. For example, a language processing algorithm was recently found to rate white names as more “pleasant” than black names, mirroring earlier psychology experiments on humans.

“It happens to be that color does matter in machine vision”

The problem here is with the lack of diversity of people and opinions in the databases used to train AI, which are created by humans.

“We had this problem with our database for wrinkle estimation, for example,” said Konstantin Kiselev, chief technology officer of Youth Laboratories, in an interview. “Our database had a lot more white people than, say, Indian people. Because of that, it’s possible that our algorithm was biased.”

“It happens to be that color does matter in machine vision,” Alex Zhavoronkov, chief science officer of Beauty.ai, wrote me in an email. “and for some population groups the data sets are lacking an adequate number of samples to be able to train the deep neural networks.”

The other problem for the Beauty.ai content in particular, Kiselev said, is that the large majority (75 percent) of contest entrants were European and white. Seven percent were from India, and one percent were from the African continent. That’s 40,000 Indians and 9,000 people from Africa that the algorithms decided didn’t match up with the idea of beauty that they’d been trained to recognize.

“It’s possible that only a small amount of people knew about our contest in these places,” Kiselev said. “PR was the issue, and we want to do more outreach in other countries.”

Beauty.ai will be running another beauty contest in October, so they’ll have another shot at making good on their promises with regards to doing a better job of collecting entrants from countries outside of Europe.

The question of how to erase bias in databases is much thornier, however, and brings to mind earlier developments. Camera film was originally designed to perform best with white skin in frame, for example, meaning that until the industry decided to correct the base issue, every camera demonstrated a racist bias even in the hands of ostensibly non-racist photographers.

Indeed, Zhavoronkov told me that the Beauty.ai algorithms sometimes discarded selfies of dark-skinned people if the lighting was too dim.

Deep learning is similar in another way: researchers share training databases and off-the-shelf frameworks, often without changing them, meaning that biases are reproduced in algorithms across the board even if the scientists themselves have the best of intentions.

The only way to fix this is to change part of the system itself—in this case, the databases networks are trained on.

“What the industry needs is a large centralized repository of high-quality annotated faces and annotated images of the various ethnic groups publically available to the public and for startups to be able to minimize racial bias,” said Zhavoronkov.

“Because when a small group of programmers is developing an app utilizing machine vision,” he continued, “they usually don’t care about racial bias, they just want to get to market quickly.”