In 2013, researchers at the University of North Carolina, Wilmington (UNCW) published a facial recognition dataset consisting of more than 1 million images of trans people who had uploaded videos of their medical transition to YouTube. The researchers used the videos without the explicit permission of their owners, and with the stated goal of training facial recognition systems to recognize people before and after they start Hormone Replacement Therapy (HRT).
More recently, an audit of the database has also uncovered that the researchers left the full videos in an unprotected Dropbox until 2021.
Videos by VICE
The audit of the dataset, published in Big Data & Society by authors Os Keyes and Jeanie Austin, dives deep into the origins and problems of the project, which took still images from YouTube videos uploaded by 38 different people. Researchers led by UNCW’s Karl Ricanek pulled more than one million frames from the videos of trans people in the middle of transition, created what they called the “HRT Transgender Dataset.”
Controversy erupted over the project shortly after it was brought to public attention. Ricanek justified the experiment with an extreme and virtually impossible scenario: terrorists using HRT to avoid facial recognition and sneak across borders undetected. Ricanek told Motherboard that he had challenged a classroom of students to help him break facial recognition. A student then showed him a YouTube video of a trans person who’d taken a snapshot every day over two years of their transition and then uploaded it to YouTube as a slideshow. “No face recognition system would ever make that match,” Ricanek said.
So Ricanek and a team of researchers found dozens of videos on YouTube and used the images to build the database. According to Ricanek, UNCW’s Institutional Review Board (IRB) told him he didn’t need the board’s approval to build the dataset.This is backed up by emails between Keyes and UNCW’s IRB, which state that the study was not subject to oversight from the IRB. “We did not require Dr. Ricanek to submit an application for approval to the UNCW IRB for this study, because it involved the analysis of images taken from a public social media website,” the IRB told Keyes.
UNCW said much the same to Motherboard when reached for comment. “The university is hesitant to speculate about what it would have done in 2012 or would do today,” it told Motherboard. “In 2012, Dr. Ricanek did not formally seek approval (by submitting an IRB application) to build this dataset. However, IRB approval is not always required to conduct research on publicly available data, as publicly available data would not be viewed as private information per federal regulations. Dr. Ricanek may have sought informal guidance (in person or by phone) on whether or not IRB approval was needed. The IRB regularly receives inquiries of this nature.”
Ricanek also claims he reached out to every person he could find contact information for, and that none of the people in the videos told them no. “We got some responses back,” he told Motherboard. “None that said, ‘No, you can’t do that.’” According to Ricanek, no one initially objected. At least two reached out later after the dataset was published, however, and one asked to be removed. Another, when reached by Motherboard, said they’d never heard from Ricanek or UNCW and was surprised to learn they were a part of a dataset.
“I’m literally shocked right now that my videos/photos were used without my permission in a dataset. Thank you for letting me know,” said Chase Ross, who was one of eight individuals included in the dataset that Motherboard attempted to contact for this story.
According to Keyes’ audit, which was based on documents obtained through a public records request sent to UNCW, there was no indication that the researchers had contacted anyone about the use of their images in a facial recognition dataset. Motherboard has looked over these records and found the same, though it is possible that an incomplete set of records was initially produced by the university.
“We saw absolutely no indication; the only emails relating to participants were a complaint from a participant that she had not been notified, and an email from the postdoc on the project confirming to a third party that participants had not given their consent,” Keyes told Motherboard.
One email in the public records request from 2017, four years after the datasets creation, shows one of the set subjects reaching out to Ricanek. “Would really love to understand why you’re featuring my images in your reports without my consent?” The email said, and included a link to a PDF of the study. Motherboard attempted to reach this person, but did not hear back.
“Please be more specific regarding your likeness (image),” Ricanek said in response. “Most of the images in this article were derived from articles and videos posted on the web. I attempted to get permission for all images used from the source that was listed. I followed the copyright laws associated with each source.”
“I am the first example in your HRT section, never gave consent, and was never contacted,” the person said in response.
Ricanek told the person they had attempted to get everyone’s permission when the dataset was created. “I am unsure as to what happened with your video and the consent protocol…regardless of the protocol, I will NOT make the dataset links available from this point forward. Personally, as an academic researcher I look to improve the world around me and as such, I can only hope that this research has not caused you any discomfort. Again, the dataset will no longer be available.”
The person said they would prefer to have their image removed from the dataset and from articles about the dataset going forward. “I will ask but the article was published in 2013 and it may not be possible to remove the image from the bootlegged copies.”
The initial link the concerned person sent is still uploaded, complete with their picture. On further publications of the dataset, Ricanek has put a small yellow box over the eyes of the individuals.
According to Ricanek, the lack of those records is due to the university changing computer systems. “I don’t have an email or desktop that’s from the 80s or whatever,” he said. “Every time I get a new machine, I have to hit the email server and download those emails. When we switched over, a lot of that stuff was lost.” Ricanek also said that local laws don’t require them to hold on to records for more than a few years, which would mean most of the communication from around the time the database was being built could be lost.
UNCW did not comment on its data retention policy when asked.
Ricanek has maintained that he was motivated by national security. “I can get $5 of estrogen pills on the black market, and I can defeat your billion dollar system?” he told Motherboard, referring to facial recognition systems on the border. “That could have been the headline back in 2011 when I published the article, and that was the perspective I was coming from…we have to protect against this.”
No one has ever taken hormones to defeat facial recognition software or sneak across a border, and there is no evidence that attempting to do so would enable someone to circumvent the countless other security measures that exist at airports and border checkpoints. The idea that someone could buy a few estrogen pills and use them to quickly and completely change their appearance is also completely out of line with the reality that trans people experience while undergoing medical transition.
Keyes called Ricanek’s fears of hormone-taking terrorists “ludicrous,” saying that this scenario reflects stereotypes more than actual security threats. “These projected motivations mirror more general transphobic tropes—that transgender people are suspect, sneaky, and otherwise engaged in acts of trespass (between genders or borders) and subterfuge.”
Facial recognition is a controversial and often dangerous technology. The systems are racially biased, often misgender trans people, and control U.S. borders. While automated systems are often described as objective and precise by those who implement them, researchers have repeatedly shown they carry the biases of the people who design and build them.
“This research, as well as the Clearview software which makes it possible for strangers to easily discover your name and address through a photo taken as you walk down the street, puts trans people in danger—especially those who may not be out of the closet,” Jean Linis-Dinco, a data science researcher, wrote about Ricanek’s study for Melbourne Law School. “New technologies have exacerbated the securitization, surveillance and violence of trans bodies worldwide.”
Ricanek said he was intimately familiar with these issues. “I’m Black. I understand technology being weaponized against minority groups so I totally get where they’re coming from,” he said, in an interview with Motherboard. According to Ricanek, he took phone calls with some of the people whose videos were used to make the dataset. They expressed concern about images from their pre-transition life being made public.
“She said, ‘This is my biggest nightmare,’” Ricanek said one individual told him. “Oh my god. I did not think about that at the time. I didn’t think that this was going to get out in the public sphere the way that it did.”
Five days after our interview, Ricanek gave a presentation at the 2022 Federal Identity Forum and Expo, a workshop where the federal government meets with companies and researchers to talk about facial recognition and other topics with people in the security industry. During his presentation, Ricanek used slides that pulled images from his 2011 dataset. Unlike the initial publication, he’d put tiny yellow blocks over the eyes of the individuals.
After speaking with Keyes for the audit, Ricanek said he took down the dataset and began to deny access to anyone who asked. According to Ricanek, Keyes reached out and explained that the dataset was still up. “They actually hacked my Dropbox,” Ricanek said, claiming that Keyes accessed his lab’s private cloud storage. “And they found a version of the dataset in the Dropbox for the lab, which I did not know existed at that time.
Keyes did not hack the Dropbox. The Dropbox link and the password to it were among the public records requests they and Austin received from UNCW. “If I could hack Dropbox, I would be making a lot more money than PhD students do,” Keyes said.
According to the UNCW emails, which Ricanek was copied on, his research assistants gave away access to the raw materials of the database to other researchers—including one project which involved distributing the images to be labeled by users on Amazon Mechanical Turk. Ricanek had previously maintained that all that was ever given out were the images themselves and a Google Doc that listed the YouTube links. But the Dropbox file is described as being “more than 1gb,” in the emails.
“I didn’t initiate this exchange,” Ricanek said of the emails where an assistant gave someone access to the Dropbox. “But… there were only a handful of people who made that request. And typically, these were individuals that weren’t actually doing face recognition work. They were looking at it for other sorts of social issues.”
“The whole rest of the world is using face recognition. We are not going to stop that. It’s part of our national defense,” Ricanek said. “It’s the same problem with face recognition bias towards people of color. We have to be at the table working with them to make sure their algorithms work properly. We can’t say, ‘stop, don’t use it, it’s racist.’”
It is deeply unlikely that facial recognition trained to detect trans people is ever going to be used to stop a terrorist attack. What is far more likely is that datasets and technology like this could be used to harass and dehumanize trans people in a country where the majority of states have passed or considered passing anti-trans legislation. In states across the US, politicians have increasingly tried to criminalize and discriminate against trans people, pushing legislation and policies that prevent trans people from legally using the bathroom, prevents trans children from obtaining gender-affirming treatments like puberty blockers and criminalizes people who might help them, bans books that mention the existence of trans people, forbids trans athletes from competing in sports, and makes it more difficult for trans people to get healthcare.
Facial recognition, by definition, is a form of surveillance, and this experiment—effective or not—is designed to make it easier to identify trans people, who, besides being systematically discriminated against by governments, are often the victims of hate crimes and violence associated with their identity.
Keyes pushed back at Ricanek’s assertion, saying that even if the research was not done with malicious intent, its central premise still causes harm by subjecting trans people, whose bodies are already viewed as suspect, to even more scrutiny. “If his interest is in trans people getting a seat at the table, how many trans scholars did he involve in his research?” they said. “In preparing the paper; in being credited with the work? If his interest is in ‘a seat at the table,’ how does taking people’s private, in-community videos, using them for purposes the authors might very well disagree with, fixing screenshots of them forevermore in academia without their consent, help that? If he’s so interested in a seat at the table for marginalized people, he should endeavor to actually do anything to enable that.”