One key method for ensuring privacy while processing large amounts of data is de-identification.
De-identified data refers to data through which a link to a particular individual cannot be established. This often involves “scrubbing” the identifiable elements of personal data, making it “safe” in privacy terms while attempting to retain its commercial and scientific value.
In legal terms, the criteria for de-identified data remain vague. The Health Insurance Portability and Accountability Act defines data as de-identified if it “does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual is not individually identifiable health information.” In its recent report, the FTC gave recommendations to help assess whether data should be considered identifiable. However, best practices have not been identified and industry practices vary widely.
The Future of Privacy Forum held a conference on December 5, 2011 to begin addressing this issue. Our goal is to facilitate the development of safe de-identification practices for data sets that extend beyond the health-care sector.
Future of Privacy Forum “De-ID Project”
In the era of big data, the debate over the definition of personal information, de-identification and re-identification has never been more important. Privacy regimes often rely on data being considered Personal in order to require the application of privacy rights and protections. Data that is anonymous is considered free of privacy risk and available for public use.
Yet much data that is collected and used exists somewhere on a spectrum between these stages. FPF’s De-ID Project seeks to describe a practical framework for applying privacy restrictions to data based on the nature of data that is collected, the risks of de-identification, and the additional legal and administrative protections that may be applied. Important questions FPF hopes to consider include:
- What weight should be given to non-technical factors such as legal commitments not to make data public or not to attempt to re-identify data.
- What weight is to be given to impacts of de-ID techniques on utility of data.
- What status should be awarded to linkable or pseudonymous data.