MAC Addresses and De-Identification

MAC Addresses and De-Identification

Location analytics companies log the hashed MAC address of mobile devices in range of their sensors at airports, malls, retail locations, stadiums and other venues. They do so primarily in order to create statistical reports that provide useful aggregated information such as average wait times on line, store “hot spots,” and the percentage of devices that never make it into a zone that includes a checkout register. FPF worked with the leading companies providing these services to create an enforceable Mobile Location Code of Conduct that restricts discriminatory uses of data, creates a central opt out, promotes in-store notice and other protections. We filed comments last week with the FTC describing the program in detail.

The only data transmitted by mobile devices that most location companies can log is the MAC address – the Wi-Fi or Bluetooth identifier devices broadcast when Wi-Fi or Bluetooth is turned on. The privacy debate around the use of this technology and the Code has centered on the sensitivity of logging and maintaining hashed MAC addresses, and hinges on whether a MAC address should be considered personal information.

Is a MAC address personal information? Well, it is linked to individual consumer devices, either as a consistent Wi-Fi or Bluetooth identifier. If enough data is linked to any consistent identifier over time, it is in the realm of technical possibility that the identity of a user can be ascertained. If there was a commercially-available database of MAC addresses, it is possible that such a database could be used to identify users. We are not aware of any such MAC address look-up database. But we do recognize that the data collected is linked to a specific device. For this reason, the Code of Conduct treats hashed MAC addresses associated with unique devices as something in between fully anonymized data and explicitly personal data. This reflects the view that Professor Daniel Solove posited effectively when he argued that PII exists not as a binary, but on a spectrum, with no risk of identification at one end, and individual identification at the other. In many real-world instances of data collection, the privacy standards in place reflect where the data lies on this spectrum; they consist not only of technical measures to protect the data, but also internal security and administrative controls, as well as enforceable legal commitments. In the case of Mobile Location Analytics, many companies are confident that by hashing MAC addresses, keeping them under administrative and security controls, and publicly committing not to attempt to identify users, they have adequately de-identified the data they log.

However, it is important to understand, that Code does NOT take the position that hashing MAC addresses amounts to a de-identification process that fully resolves privacy concerns. According to the Code, data is only considered fully “de-identified” where it may not reasonably be used to infer information about or otherwise be linked to a particular consumer, computer, or other device. To qualify as de-identified under the Code, a company must take measures such as aggregating data, adding noise to data, or statistical sampling. These are considered to be reasonable measures that de-identify data under the Code, as long as an MLA company also publicly commits not to try to re-identify the data, and contractually prohibits downstream recipients from trying to re-identify it. To assure transparency, any company that does de-identify data in this way must describe how they do so in their privacy policy.

As most of the companies involved in mobile location analytics do indeed link hashed MAC addresses to individual devices, the data they collect to track devices over time does not qualify as strictly “de-identified” under the Code and the data they collect is not exempt from the Code. Rather, the companies collect and use what the Code terms “de-personalized” data.* De-personalized data is defined in the Code as data that can be linked to a particular device, but cannot reasonably be linked to a particular consumer. Companies using de-personalized data must:

    1. take measures to ensure that the data cannot reasonably be linked to an individual (for instance, hashing a MAC address or deleting personally identifiable fields);
    2. publicly commit to maintain the data as de-personalized; and
    3. contractually prohibit downstream recipients from attempting to use the data to identify a particular individual.

When companies hash MAC addresses, they are thus fully subject to the Codes requirements, including signage, consumer choice, non-discrimination.

Different kinds of data on the PII/non-PII spectrum — given the inherent risks and benefits of each — merit a careful consideration of the combination of reasonable technical encryption and administrative measures and legal commitments that would be most suitable. After all, if “completely unidentifiable by any technical means, no matter how complex or unlikely” were the standard for the use of any data in the science and business worlds, much valuable research and commerce would come to an end. The MLA Code represents a pragmatic view that allows vendors to provide a service that is useful for businesses and consumers, while applying responsible privacy standards.

* Suggestions for a better term that de-personalized are welcomed. We considered “pseudonymized” but found the term awkward.

Comments

Posted On
Mar 27, 2014
Posted By
Jim Fenton

There are at least two issues you haven’t addressed:

1. The ability of location analytics compaies to aggregate results from more than one of their customers. If they’re simply providing a service to retailer A and retailer B, they should hash the MAC addresses from the two stores differently (using different retailer-specific values appended to the MAC addresses). But if they’re planning to aggregate the results from multiple retailers, they’re creating a more comprehensive profile of the consumer, and that creates a greater privacy concern.

2. None of the hashing discussion addresses requests that might be obtained from the government about information relating to the activities of a given MAC address. The given MAC address could, of course, just be hashed and compared with the database. While such requests might be entirely legitimate, the location analytics companies should not create a false expectation that hashing addresses this sort of potential privacy concern and should explicitly state how long individual records are kept.

Leave a Reply


Privacy Calendar

Sep
15
Mon
all-day Big Data: A Tool for Inclusion or Exclusion? @ Constitution Center
Big Data: A Tool for Inclusion o… @ Constitution Center
Sep 15 all-day
The Federal Trade Commission will host a public workshop entitled “Big Data: A Tool for Inclusion or Exclusion?” in Washington on September 15, 2014, to [...]
Sep
17
Wed
all-day IAPP Privacy Academy and CSA Congress 2014 @ San Jose Convention Center
IAPP Privacy Academy and CSA Con… @ San Jose Convention Center
Sep 17 – Sep 19 all-day
This fall, the International Association of Privacy Professionals (IAPP) and Cloud Security Alliance (CSA) are bringing together the IAPP Privacy Academy and the CSA Congress [...]
Oct
21
Tue
6:00 pm Consumer Action’s 43rd Annual Awards Reception @ Google
Consumer Action’s 43rd Annual Aw… @ Google
Oct 21 @ 6:00 pm – 8:00 pm
To mark its 43rd anniversary, Consumer Action’s Annual Awards Reception on October 21, 2014, will celebrate the theme of “Train the Trainer.” Through the power of [...]
Jan
28
Wed
all-day Data Privacy Day
Data Privacy Day
Jan 28 all-day
“Data Privacy Day began in the United States and Canada in January 2008, as an extension of the Data Protection Day celebration in Europe. The [...]
Jan
28
Thu
all-day Data Privacy Day
Data Privacy Day
Jan 28 all-day
“Data Privacy Day began in the United States and Canada in January 2008, as an extension of the Data Protection Day celebration in Europe. The [...]

View Calendar