Using a massive 56.7-terabyte archive of the far-right social media site Parler that was captured on Sunday, open-source analysts, hobby archivists, and computer scientists are working together to catalog videos and photos that were taken at the attack on the U.S. Capitol last Wednesday.
Over the last few days, Parler was de-platformed by Amazon Web Services, the Google Play Store, and the Apple App Store, which has taken it offline (at least temporarily). But before it disappeared, a small group of archivists made a copy of the overwhelming majority of posts on the site.
Videos by VICE
While all the data scraped from Parler was publicly available, archiving it allows analysts to extract the EXIF metadata from photos and videos uploaded to the social media site en masse and to examine specific ones that were taken at the insurrection on Capitol Hill. This data includes specific GPS coordinates as well as the date and time the photos were taken. These are now being analyzed in IRC chat channels by a handful of people, some of whom believe crimes can be catalogued and given to the FBI.
“I hope that it can be used to hold people accountable and to prevent more death,” donk_enby, the hacker who led the archiving project, told Motherboard on Monday.
One technologist took the scraped Parler data, took every file that had GPS coordinates included within it, formatted that information into JSON, and plotted those onto a map. The technologist then shared screenshots of their map with Motherboard, showing Parler posts originating from various countries, and then the United States, and finally in or around the Capitol itself. In other words, they were able to show that Parler users were posting material from the Capitol on the day of the rioting, and can now go back into the rest of the Parler data to retrieve specific material from that time.
They also shared the newly formatted geolocation data with Motherboard. Motherboard granted the technologist anonymity to speak more candidly about a potentially sensitive topic.
The technologist said that, to at least some extent, since this data shows the use of Parler during the Capitol raid attempt, “that’s a piece of the overall puzzle which someone, somewhere can use.”
“It’s definitely to help facilitate or otherwise create another exposure that the public can consume,” they added, explaining their motivations for cleaning the Parler data.
This particular technologist did not distribute their version of the data more widely, however, with the aim of preventing abuse and misuse of the data.
“Sure, the source data are already public. But that doesn’t mean I have to add an even easier path to data misuse,” they said.
“For this Parler data, it would clearly not be correct to say ‘every single user is a Nazi’ and so by complete disclosure you are enabling someone who WOULD hold such a narrative to make bad choices and take bad actions if they wished,” they added.
Do you know anything else about the Parler data? We’d love to hear from you. Using a non-work phone or computer, you can contact Joseph Cox securely on Signal on +44 20 8133 5190, Wickr on josephcox, OTR chat on jfcox@jabber.ccc.de, or email joseph.cox@vice.com.
Earlier on Tuesday, an analysis of the metadata by Gizmodo also showed that Parler users made it into the Capitol.
Others who have managed to get their hands on the Parler data have begun to make lists of videos and photos that have GPS coordinates on Capitol Hill, and have written scripts to pull those videos from the broader dump so people can analyze them. On an IRC chat channel, a small group of people are watching and analyzing videos and are posting their video IDs and description into a Google spreadsheet called “Notable Parler Videos.” One description reads: “at the capital, pushing police, guy in MAGA hat screaming ‘I need some violence now.’” A description for the IRC channel includes a link to an FBI tip line specifically targeted at identifying people at the riot.
One open source project calling itself Parler Analysis has collected different tools from around the web to handle the data in different ways. One is used to scrape usernames, for example, while another is for extracting images and videos, and yet another is an alternative cleaned dataset of cleaned Parler geolocation coordinates in a different format.
Subscribe to our cybersecurity podcast CYBER, here.