DEFINITION of 'De-Anonymization'

A reverse data mining technique that re-identifies encrypted or generalized information. De-anonymization cross-references anonymized information with other available data in order to identify a person, group, or transaction. 

Also known as Data Re-Identification.

BREAKING DOWN 'De-Anonymization'

The technology-savvy era is rapidly disrupting the traditional way of doing things across various sectors of the economy. In recent years, the financial industry has seen a lot of digital products introduced to its sector by fintech companies. These innovative products have promoted financial inclusion whereby more consumers have access to financial products and services at a lower cost than traditional financial institutions allow. The rise in the implementation of technology has brought about an increase in the collection, storage, and use of data. Technology tools like social media platforms, digital payment platforms, and smart phone technology have unveiled a ton of data used by various companies to enhance their interaction with consumers. This ton of data is called Big Data, and is a cause for concern among individuals and regulatory authorities calling for more laws that protect the identities and privacy of users.

In the age of Big Data where sensitive information about a user’s online activities are shared instantaneously through cloud computing, data anonymization tools have been employed to protect users’ identities. Anonymization masks the personally identifiable information (PII) of users transacting in various fields like health services, social media platforms, e-commerce trades, etc. PII includes information like date of birth, Social Security Number (SSN), zip code, and IP address. The need to mask the digital trails left behind by online activities have led to the implementation of anonymization strategies like encryption, deletion, generalization, and perturbation. Although data scientists use these strategies to sever sensitive information from the shared data, they still preserve the original information, thereby opening doors for the possibility of re-identification.

De-anonymization reverses the process of anonymization by matching shared but limited data sets with data sets that are easily accessible online. Data miners can then retrieve some information from each available data set to put together a person’s identity or transaction. For example, a data miner could retrieve a data set shared by a telecommunications company, a social media site, an e-commerce platform, and a publicly available census result to determine the name and frequent activities of a user.

Re-identification can be successful when new information is released or when the anonymization strategy implemented isn’t done properly. With a vast supply of data and limited amount of time available per day, data analysts and miners are implementing shortcuts known as heuristics in making decisions. While heuristics saves valuable time and resources in combing through a data set, it could also create gaps that could be taken advantage of if the wrong heuristic tool was implemented. These gaps could be identified by data miners seeking to de-anonymize a data set for either legal or illegal purposes.

Personally identifiable information gotten illegally from de-anonymization techniques can be sold in underground marketplaces, which are also a form of anonymization platforms. Information that falls into the wrong hands can be used for coercion, extortion, and intimidation leading to privacy concerns and enormous costs for businesses who fall victims.

De-anonymization can also be used legally. For example, the Silk Road website, an underground marketplace for illegal drugs, was hosted by an anonymized network called Tor which uses an onion strategy to obfuscate the IP addresses of its users. The Tor network also hosts a couple of other illegal markets trading in guns, stolen credit cards, and sensitive corporate information. With the use of complex de-anonymization tools, the FBI successfully cracked and shut down Silk Road and sites engaging in child pornography.

Success on re-identification processes have proved that anonymity is not guaranteed. Even if groundbreaking anonymization tools were implemented today to mask data, the data could be re-identified in a couple of years as new technology and new data sets become available.