Data Anonymization

What Is Data Anonymization?

Data anonymization seeks to protect private or sensitive data by deleting or encrypting personally identifiable information from a database. Data anonymization is done for the purpose of protecting an individual’s or company’s private activities while maintaining the integrity of the data gathered and shared.

Data anonymization is also known as "data obfuscation," "data masking," or "data de-identification." It can be contrasted with de-anonymization, which are techniques used in data mining that attempt to re-identify encrypted or obscured information.

Key Takeaways

  • Data anonymization refers to stripping or encrypting personal or identifying information from sensitive data.
  • As businesses, governments, healthcare systems, and other organizations increasingly store individuals' information on local or cloud servers, data anonymization is crucial to maintain data integrity and prevent security breaches.
  • In the highly sensitive healthcare and financial sectors, patient or customer data must be obscured in such a way to meet regulatory requirements.

Understanding Data Anonymization

Corporations generate, store, and process enormous amounts of sensitive data in the normal course of their business operations. Advancement in technology has thrived because of relevant information found in data that has been generated and shared across various sectors and countries. Financial innovation in technology (fintech) has made boundless progress in the way financial services are customized to clients, thanks to data that has been shared from sectors such as social media and e-commerce establishments.

Data shared between digital media and e-commerce firms has helped both sectors better advertise products on their sites to a specific user or consumer. However, in order for shared data to be useful without compromising the identities of clients compiled in the database, anonymization must be utilized.

Data Anonymization in Practice

Data anonymization is carried out by most industries that deal with sensitive information such as the healthcare, financial, and digital media industries while promoting the integrity of data sharing. Data anonymization reduces the risk of unintended disclosure when sharing data between countries, industries, and even departments within the same company. It also reduces opportunities for identify theft to occur.

For example, a hospital sharing confidential data on its patients to a medical research lab or pharmaceutical company would be able to do so ethically if it keeps its patients anonymous. This can be done by removing the names, Social Security Numbers, dates of birth, and addresses of its patients from the shared list while leaving the important components required for medical research like age, ailments, height, weight, gender, race, etc.

Data Anonymization Techniques

Anonymization of data is done in various ways including deletion, encryption, generalization, and a host of others. A company can either delete personally identifiable information (PII) from its data gathered or encrypt this information with a strong passphrase. A business can also decide to generalize the information collected in its database. For example, a table contains the exact gross income earned by five CEOs in the retail sector. Let's assume the recorded incomes are $520,000, $230,000, $109,000, $875,000, and $124,000. This information can be generalized into categories like “< $500,000” and “≥ $500,000”. Although, the data is obfuscated, it will still be useful to the user.

Data Anonymization Reasoning

Data anonymization is whereby classified information is sanitized and masked in such a way that if a breach occurs, the data acquired is useless to the culprits. The need to protect data should be held in high priority in every organization, as classified information that falls into the wrong hands can be misused, intentionally or unintentionally. Lack of sensitivity when handling sensitive client information can come at a great cost to businesses due to regulatory authorities cracking down on gross negligence. Legal and compliance requirements like PCI DSS (Payment Card Industry Data Security Standard) impose hefty fines on financial institutions in the event of a credit card breach.  PIPEDA, a Canadian Law, governs the disclosure and use of personal information by corporations. There are other multiple regulatory bodies that have been formed to monitor an organization’s use or misuse of private data.

Decoding anonymized data is possible through a process known as De-anonymization (or "re-identification"). Due to the fact that anonymized data can be decoded and unraveled, critics believe anonymization provides a false sense of security.

Article Sources
Investopedia requires writers to use primary sources to support their work. These include white papers, government data, original reporting, and interviews with industry experts. We also reference original research from other reputable publishers where appropriate. You can learn more about the standards we follow in producing accurate, unbiased content in our editorial policy.
  1. PCI Security Standards Council. "Why Security Matters." Accessed Dec. 4, 2020.

  2. PCI Security Standards Council. "Payment Card Industry (PCI) Data Security Standard," Page 5. Accessed Dec. 4, 2020.

  3. Office of the Privacy Commissioner of Canada. "PIPEDA in Brief." Accessed Dec. 4, 2020.

Take the Next Step to Invest
The offers that appear in this table are from partnerships from which Investopedia receives compensation. This compensation may impact how and where listings appear. Investopedia does not include all offers available in the marketplace.