Yodlee, the largest financial data broker in the U.S., sells data pulled from the bank and credit card transactions of tens of millions of Americans to investment and research firms, detailing where and when people shopped and how much they spent. The company claims that the data is anonymous, but a confidential Yodlee document obtained by Motherboard indicates individual users could be unmasked.
The findings come as multiple Senators have urged the Federal Trade Commission (FTC) to investigate Envestnet, which owns Yodlee, for selling Americans’ transaction information without their knowledge or consent, potentially violating the law.
Videos by VICE
“Let me be blunt. This is bullshit ‘anonymization,’” Nicholas Weaver, a senior researcher at the International Computer Science Institute at UC Berkeley, told Motherboard in an email after reviewing a section of the document.
The Yodlee document describes in detail what type of data its clients gain access to, how the company manages that data across its infrastructure, and the specific measures Yodlee takes to try and anonymize its dataset. The transaction data itself comes from banks, credit card companies, and apps that Yodlee works with, including Bank of America, Citigroup, and HSBC, according to previous reporting from The Wall Street Journal.
According to the 2019 document Motherboard obtained, the data includes a unique identifier given to the bank or credit card holder who made the purchase; the amount spent for the transaction; the date of the sale; the city, state, and zip code of the business the person bought from, and other pieces of metadata. One example given in the document is that, say, someone bought a meal from Applebee’s using delivery service Grubhub; Applebee’s would be marked in the data as the “primary merchant,” and Grubhub as the “secondary merchant” for that transaction.
Once logged into Yodlee’s server, clients download the data as a large text file, rather than interacting with the data in a dashboard or interface that stays solely within Yodlee’s control, according to the document.
“Bank and card files are generated on a daily basis and contain incremental transactions,” the document adds.
Do you work at Yodlee? Do you know about any other companies selling data? Do you have documents related to this? We’d love to hear from you. Using a non-work phone or computer, you can contact Joseph Cox securely on Signal on +44 20 8133 5190, Wickr on josephcox, OTR chat on jfcox@jabber.ccc.de, or email joseph.cox@vice.com.
One purpose of this data is for investment firms and hedge funds to get insight into where people are spending their money, so they can spot trends and make better decisions. These Yodlee clients include 1010 Data, Second Measure, and Point72, the investment firm run by billionaire Steven Cohen, whose previous venture closed after a large scale investigation into the company for insider trading.
A separate confidential J.P. Morgan document obtained by Motherboard lists various Yodlee products, including its “Data Platform.” This “provides the best and most comprehensive financial data at massive scale across retail banking, credit, and wealth management. This is made possible through the strengths of our data acquisition capabilities, extensive data cleaning and enrichment expertise, and massive scale,” the document adds.
Yodlee’s “data cleaning” involves removing names, email addresses, and other personally identifiable information (PII) from the transaction data, according to the Yodlee document. That includes masking patterns of numbers like account numbers, phone numbers, and SSNs and replacing them with “XXX” symbols, the document explains. The system also looks for keywords such as “TRANSFER” and “PAYROLL” and removes account information from those pieces of transaction data. Yodlee also masks the financial institution’s name in the transaction description, such as the credit card name, according to the document.
But that cleaning process still leaves a wealth of data available to clients, and could allow people included in the Yodlee dataset to be unmasked, according to several experts who reviewed the relevant section of the Yodlee document.
“It’s hard to say without the full context but, from the description, the data in itself seems to me to only be pseudonymized,” Yves-Alexandre de Montjoye, an assistant professor at Imperial College London told Motherboard in an email. “This means that it doesn’t contain information that’d directly identify a person such as names or email addresses. However, someone with access to the dataset and some information about you, e.g. shops you’ve been buying from and when, might be able to identify you.”
“Let me be blunt. This is bullshit ‘anonymization.’”
“The anonymization process described in the document is relatively simple. It does appear to remove the PII data as legally defined but does not remove spatio-temporal traces of people that can be used to connect back the data to them,” Vivek Singh, assistant professor at Rutgers University, added in an email.
Singh and de Montjoye previously worked on a 2015 study where researchers successfully identified real people in an ostensibly anonymized dataset of 3 months worth of credit card transactions covering 1.1 million people.
Those “spatio-temporal traces” are the various pieces of metadata that the document shows are included with the transaction—the date, the merchant, the physical location of the sale, and more.
“If an attacker can get hold of the spatio-temporal coordinates for just three to four randomly picked transactions in the dataset, then the attacker can unmask the person with a very high probability. With this unmasking, the attacker would have access to all the other transactions made by that individual,” Singh said, referring to how that earlier study’s approach would apply to the Yodlee data.
Someone buying this data might not even need that many if they can already identify one transaction belonging to a specific person. Weaver from UC Berkeley added that an attacker may only require “one link to deanonymize a group of transactions belonging to a user.” If someone knows a particular purchase of a target—that they ordered from a specific merchant on a particular day, for instance—because Yodlee preserves unique identifiers for each consumer across transactions, someone with the dataset could then potentially see that person’s other transactions as well.
In their letter to the FTC, Senator Ron Wyden, Senator Sherrod Brown, and Representative Anna Eshoo wrote that “The consumer data that Envestnet collects and sells is highly sensitive. Consumers’ credit and debit card transactions can reveal information about their health, sexuality, religion, political views, and many other personal details.”
Yodlee previously hired Peter Swire, a professor of law and ethics at Georgia Tech, and in 2015 he told the Wall Street Journal that Yodlee is “doing the technical and administrative things that regulators have recommended” to keep data anonymous. In another comment posted by Yodlee on the company’s own website, Swire said, “My view is that the study’s results do not apply to the Yodlee facts,” referring to de Montjoye and Singh’s study into de-anonymizing credit card data. When Motherboard told Swire about the Yodlee document that specifically lays out the anonymization process, and asked him, among other things, whether he still believed Yodlee’s privacy protections were satisfactory to avoid de-anonymization, Swire wrote in an email, “I have no comment.”
Beyond the credit card data study and experts’ review of the detailed Yodlee process, researchers have repeatedly unmasked real people in a variety of other ostensibly anonymized datasets. New York Times reporters have identified people from AOL search data and location data obtained from smartphone apps. A 2017 study from Stanford University found it was possible to identify people from web browsing data, and multiple experts said it would likely be possible to unmask users of antivirus program Avast whose anonymized browsing data was being sold sometimes without their knowledge.
In a statement, Envestnet | Yodlee said “Protecting the personal information of those who use our services is a top priority for Envestnet | Yodlee. We adhere to leading industry practices for data security, regulatory compliance, and privacy, and employ systems that monitor and remove all known identifiers from data elements that are collected. We also follow and adhere to applicable laws and industry guidance regarding the use of consumer data, including the newly enacted California Consumer Protection Act (CCPA), which further outlines processes and procedures for handling de-identified data. We are also subject to examination by the U.S. Federal Banking Agencies, per the Bank Service Company Act, for the services provided to U.S. financial institutions.”
The company admitted “Yodlee does not audit its customers to ensure that they have acted, and continue to act, consistently with such assurances.”
“For our data analytics offerings, Envestnet | Yodlee imposes technical, administrative, and contractual measures to protect consumers’ identities, such as prohibiting analytics and insights users from attempting to re-identify any consumer from the data. Leading privacy experts have also validated that Envestnet | Yodlee data analytics meet or exceed leading industry standards of de-identification processing,” the statement added.
However, in a 2015 filing with the Securities and Exchange Commission (SEC), the company admitted “Yodlee does not audit its customers to ensure that they have acted, and continue to act, consistently with such assurances.”
Most Yodlee customers that Motherboard contacted to ask if they have ever tried to de-anonymize people in the dataset or if they have ever sold or provided the data from Yodlee to another party did not respond to requests for comment. Earnest Research, which has access to Yodlee data, declined to answer either question.
A person familiar with one investment firm said that their company would not be interested in the spending habits of individual people, and are instead only concerned with macro-level trends. Motherboard granted them anonymity to speak more openly about the industry.
An FTC spokesperson declined to comment. It is standard practice for the FTC to not comment on investigations as they are not public.
Senator Wyden said in a statement, “Data brokers love to claim that making user data anonymous will protect American’s privacy. Here’s a shocker—it won’t. Data can’t always be anonymized. Computer scientists have repeatedly demonstrated that poorly ‘anonymized’ data can be reidentified. Companies need to go much further to demonstrate that they have truly addressed the privacy risks associated with the data they collect and sell.”
“I take these reports very seriously,” Senator Brown told Motherboard in a statement. “When consumers trust companies like Yodlee with their personal and private information, they expect these companies to keep their information private. Congress needs to take the burden off of consumers and establish clear rules about corporations using big data to dig into our private lives. That means corporations should only be able to collect data for permissible purposes, and Congress should ban certain uses that are abusive and serve no legitimate commercial interest.”
Correction: This piece originally incorrectly spelled out the SEC acronym as the Securities and Equities Commission.
Subscribe to our cybersecurity podcast, CYBER.