Data Anonymization

DEFINITION of 'Data Anonymization'

A data privacy technique that seeks to protect private or sensitive data by deleting or encrypting personally identifiable information from a database. Data anonymization is done for the purpose of protecting an individual’s or company’s private activities while maintaining the integrity of the data gathered and shared.

Also known as Data Obfuscation, Data Masking, and Data De-Identification.

BREAKING DOWN 'Data Anonymization'

Corporations generate, store, and process enormous amounts of sensitive data in the normal course of their business operations. Advancement in technology has thrived because of relevant information found in data that has been generated and shared across various sectors and countries. Financial innovation in technology (FinTech) has made boundless progress in the way financial services are customized to clients, thanks to data that has been shared from sectors such as social media and e-commerce establishments. Data shared between digital media and e-commerce firms has helped both sectors better advertise products on their sites to a specific user or consumer. However, in order for shared data to be useful without compromising the identities of clients compiled in the database, anonymization must be utilized.

Data anonymization is carried out by most industries that deal with sensitive information such as the healthcare, financial, and digital media industries while promoting the integrity of data sharing. Data anonymization reduces the risk of unintended disclosure when sharing data between countries, industries, and even departments within the same company. For example, a hospital sharing confidential data on its patients to a medical research lab or pharmaceutical company would be able to do so ethically if it keeps its patients anonymous. This can be done by removing the names, Social Security Numbers, dates of birth, and addresses of its patients from the shared list while leaving the important components required for medical research like age, ailments, height, weight, gender, race, etc.

Anonymization of data is done in various ways including deletion, encryption, generalization, and a host of others. A company can either delete personally identifiable information (PII) from its data gathered or encrypt this information with a strong passphrase. A business can also decide to generalize the information collected in its database. For example, a table contains the exact gross income earned by 5 CEOs in the retail sector. Let's assume the recorded incomes are $520,000, $230,000, $109,000, $875,000, and $124,000. This information can be generalized into categories like “< $500,000” and “≥ $500,000”. Although, the data is obfuscated, it will still be useful to the user.

Data anonymization whereby classified information is sanitized and masked should be done in such a way that if a breach occurs, the data acquired is useless to the culprits. The need to protect data should be held in high priority in every organization, as classified information that falls into the wrong hands can be misused, intentionally or unintentionally. Lack of sensitivity when handling sensitive client information can come at a great cost to businesses due to regulatory authorities cracking down on gross negligence. Legal and compliance requirements like PCI DSS (Payment Card Industry Data Security Standard) impose hefty fines on financial institutions in the event of a credit card breach. PIPEDA, a Canadian Law, governs the disclosure and use of personal information by corporations. There are other multiple regulatory bodies that have been formed to monitor an organization’s use or misuse of private data.

Decoding anonymized data is possible through a process known as De-Anonymization (or Re-Identification). Due to the fact that anonymized data can be decoded and unraveled, critics believe anonymization provides a false sense of security.