New Study Shows Need for De-identification Best Practices

New Study Shows Need for De-identification Best Practices

Publically releasing sensitive information is risky.  In 1997, Latanya Sweeney used full date of birth, 5 digit ZIP code, and gender to show that seemingly anonymous medical data could be linked to an actual person when she uncovered the health information of William Weld, the former governor of Massachusetts.   Sweeney in a new study analyzes the data available in the Public Genome Project (PGP) and shows once again that many people can be re-identified by using date of birth, ZIP, and gender, when other data such as a voter registration list is available.

Sweeney’s work is important, but we don’t think it should be considered an indictment of de-identification.   The cases so often cited as proof that de-identification doesn’t work – the AOL Search data release, the Netflix prize, the Weld example and the PGP data – are all examples of barely or very poorly de-identified data.  De-identification experts do NOT consider a publically disclosed database with full date of birth, 5 digit ZIP code, and gender de-identified.  In fact, those three data points divide the US population into over 3 billion unique combinations.  Full date of birth divides a population into over 36 thousand separate groups and ZIP codes further divide the US population into over 43 thousand separate groups.  Publically releasing a database with such a large number of unique combinations allows additional databases to be added and gives attackers all the time in the world to examine the data. Thus, public disclosure greatly increases the risk of identifying individuals from a database.

Sweeney’s study shows the importance of very strong de-identification practices when data is disclosed publically.  With public data, organizations should use very strong de-identification techniques, such as the Privacy Analytics Risk Assessment Tool developed by Dr. Khaled El Emam or the use of differential privacy as proposed by Dr. Cynthia Dwork.

For nonpublic databases, however, strong de-identification techniques may not strike the right balance between data utility and privacy.  When nonpublic databases are protected by both technical and administrative controls, reasonable de-identification techniques, as opposed to very strong de-identification techniques, may be appropriate.  Attackers do not have unlimited time to attempt to break the technical de-identification protection, third party data is not available, and measures are in place to provide legal commitments.  Data breaches can occur of course, but certainly we need to recognize the very different status of protected versus unprotected data and should appreciate the range of protections that can support a de-identification promise.

FPF staff are conducting research exploring the different risk profiles of nonpublic databases and publically released databases and the relevant best practices for “pretty good” de-identification for restricted databases.  Please contact us if you are interested.


Leave a Reply

Privacy Calendar

8:30 am Privacy as a Profit Center: Leve... @ Old Slip by Convene
Privacy as a Profit Center: Leve... @ Old Slip by Convene
Jan 26 @ 8:30 am – Jan 27 @ 4:15 pm
Learn how those on the leading edge of privacy governance and digital innovation from companies including Cigna, Cisco Systems, eBay Inc. Public Policy Lab, FocusMotion,Ghostery, Goodyear Tire & Rubber Company, Google, HP Enterprise Security Products, JPMorgan[...]
all-day Data Privacy Day
Data Privacy Day
Jan 28 – Jan 29 all-day
“Data Privacy Day began in the United States and Canada in January 2008, as an extension of the Data Protection Day celebration in Europe. The Day commemorates the 1981 signing of Convention 108, the first[...]
all-day Global Privacy Summit 2015
Global Privacy Summit 2015
Mar 4 – Mar 6 all-day
For more information, click here.
6:00 pm CDT Annual Dinner “TechProm” 2015
CDT Annual Dinner “TechProm” 2015
Mar 10 @ 6:00 pm – 9:00 pm
Featuring the most influential minds of the tech policy world, CDT’s annual dinner, TechProm, highlights the issues your organization will be facing in the future and provides the networking opportunities that can help you tackle[...]
all-day BCLT Privacy Law Forum
BCLT Privacy Law Forum
Mar 13 all-day
This program will feature leading academics and practitioners discussing the latest developments in privacy law. UC Berkeley Law faculty and conference panelists will discuss cutting-edge scholarship and explore ‘real world’ privacy law problems. Click here[...]
all-day PL&B’s Asia-Pacific Roundtable (...
PL&B’s Asia-Pacific Roundtable (...
May 27 all-day
PROFESSOR GRAHAM GREENLEAF, Asia-Pacific Editor, Privacy Laws & Business International Report, will lead a roundtable on the countries of most interest to business in the Asia-Pacific region. Click here for more information.
all-day PL&B’s 28th Annual International...
PL&B’s 28th Annual International...
Jul 6 – Jul 8 all-day
The Privacy Laws & Business 27th Annual International Conference featured more than 40 speakers and chairs from many countries over 3 intensive days. At the world’s longest running independent international privacy event participants gained professionally by[...]

View Calendar