Author Archive

FTC Wants Tools to Increase Transparency and Trust in Big Data

However we want to define “Big Data” – and the FTC’s latest workshop on the subject suggests a consensus definition remains elusive – the path forward seems to call for more transparency and the establishment of firmer frameworks on the use of data. As Chairwoman Ramirez suggested in her opening remarks, Big Data calls for a serious conversation about “industry’s ethical obligations as stewards of information detailing nearly every facet of consumers’ lives.”

Part of that challenge is that some Big Data uses are often “discriminatory”. Highlighting findings from his paper on Big Data and discrimination, Solon Barocas began the workshop by noting that whole point of data mining is to differentiate and to draw distinctions. In effect, Big Data is a rational form of discrimination, driven by apparent statistical relationships rather than any capriciousness. When humans introduce unintentional biases into the data, there is no ready solution at a technical or legal level. Barocas called for a conversation for lawyers and public policy makers to have a conversation with the technologists and computer scientists working directly with data analytics – a sentiment echoed when panelists realized a predictive analytics conference was going on simultaneously across town.

But the key takeaway from the workshop wasn’t that Big Data could be used as tool to exclude or include. Everyone in the civil rights community agreed that data could be a good thing, and a number of examples were put forth to suggest once more that data had the potential to be used for good or for ill. Pam Dixon of the World Privacy Forum classifying individuals creates a “data paradox,” where the same data can be used to help or to harm that individual. For our part, FPF released a report alongside the Anti-Defamation League detailing Big Data’s ability to combat discrimination. Instead, there was considerable desire to understand more about industry’s approach to big data. FTC staff repeatedly asked not just for more positive uses of big data by the private sector, but inquired as to what degree of transparency would help policy makers understand Big Data decision-making.

FTC Chief Technologist Latanya Sweeney followed up her study that suggested web searches for African-American names were more likely than searches of white-sounding names to return ads suggesting the person had an arrest record by looking at credit card advertising and website demographics. Sweeney presented evidence that advertisements for harshly criticized credit cards were often directed to the homepage of Omega Psi Phi, a popular black fraternity.

danah boyd observed that there was a general lack of transparency about how Big Data is being used within industry, for a variety of complex reasons. FTC staff and Kristin Amerling from Senate Commerce singled out the opacity surrounding the practices of data brokers when describing some of the obstacles being faced when policy makers try to under how Big Data is being used.

Moreover, while consumers and policy makers are trying to grapple with what companies are doing with their streams of data, industry is also placed in the difficult position of making huge decisions about how that data can be used. For example, boyd cited the challenges JPMorgan Chase faces when using analytics to evaluate human trafficking. She applauded the positive work the company was doing, but noted that expecting it to have the ability or expertise to effectively intervene in trafficking perhaps asks too much. They don’t know when to intervene or whether to contact law enforcement or social services.

These questions are outside the scope of their expertise, but even general use of Big Data can prove challenging for companies. “A lot of the big names are trying their best, but they don’t always know what the best practices should be,” she concluded.

FTC Commissioner Brill explained that her support for a legislative approach to increase transparency and accountability among data brokers, their data sources, and their consumers, was to help consumers and policy makers “begin to understand how these profiles are being used in fact, and whether and under what circumstances they are harming vulnerable populations.” In the meantime, she encouraged industry to take more proactive steps. Specifically, she recommended again that data brokers explore how their clients are using their information and take steps to prevent any inappropriate uses and further inform the public. Companies can begin this work now, and provide all of us with greater insight into – and greater assurances about – their models,” she concluded.

A number of legal regimes may already apply to Big Data, however. Laws that govern the provision of credit, housing, and employment will likely play a role in the Big Data ecosystem. Carol Miaskoff at the Equal Employment Opportunity Commission suggested there was real potential with Big Data to gather information about successful employees and use that to screen people for employment in a way that exacerbates prejudices built into the data. Emphasizing his recent white paper, Peter Swire suggested there were analogies to be made between sectoral regulation in privacy and sectoral legislation in anti-discrimination law. With existing legal laws in place, he argued that it was past time to “go do the research and see what those laws cover” in the context of Big Data.

“Data is the economic lubricant of the economy,” the Better Business Bureau’s C. Lee Peeler argued, and he supported the FTC’s continued efforts to explore the subject of Big Data. He cited earlier efforts by the Commission to examine inner-city marketing practices, which produced a number of best practices still valid today. He encouraged the FTC to look at what companies are doing with Big Data on a self-regulatory basis as a basis for developing workable solutions to potential problems.

So what is the path forward? Because Big Data is, in the words of Promontory’s Michael Spadea, a nascent industry, there is a very real need for guidelines on not just how to evaluate the risks and benefits of Big Data but also how to understand what is ethically appropriate for business. Chris Wolf highlighted FPF’s recent Data-Benefit Analysis and suggested companies were already engaged in detailed analysis of the use of Big Data, though everyone recognized that businesses practices and trade secrets precluded making much of this public.

FTC staff noted there was a “transparency hurdle” to get over in Big Data. Recognizing that “dumping tons of information” onto consumers would be unhelpful, staff picked up on Swire’s suggestion that industry needed some mechanism to justify what is going on to either regulators or self-regulatory bodies. Spadea argued that “the answer isn’t more transparency, but better transparency.” The Electronic Frontier Foundation’s Jeremy Gillula recognized the challenge companies face revealing their “secret sauce,” but encouraged them to look at more way to give consumer more general information about what was going on. Otherwise, he recommended, consumers ought to collect big data on big data and turn data analysis back on data brokers and industry at large through open-source efforts.

At the same time, Institutional Review Boards, which are used in human subject testing research, were again proposed as a model for how companies can begin affirmatively working through these problems. Citing a KPMG report, Chris Wolf insisted that strong governance regimes, including “a strong ethical code, along with process, training, people, and metrics,” were essential to confront the many ethical and philosophical challenges that flirted around the day’s discussions.

Jessica Rich, the Director on the FTC’s Consumer Protection Bureau, cautioned that the FTC would be watching. In the meantime, industry is on notice. The need for clearer data governance frameworks is clear, and careful consideration of Big Data project should be both reflexive and something every industry privacy professional talks about.

-Joseph Jerome, Policy Counsel

Relevant Reading from the Workshop:


A Path Forward for Big Data

How should privacy concerns be weighed against the benefits of big data? This question has been at the heart of policy debates about big data all year, from the President’s announcement of the White House Big Data review in January to the FTC’s latest workshop looking at big data’s ability to exclude or include. Answering this question could very well present the biggest public policy challenge of our time, and the need to face that challenge is growing.

Increasingly, there are new worries that big data is being used in ways that are unfair to some people or classes of people. Resolving those worries and ensuring that big data is being used fairly and legitimately is a challenge should be a top priority for industry and government alike.

Today, FPF is releasing two papers that we hope will help frame the big data conversation moving forward and promote better understanding of how big data can shape our lives. These papers provide a practical guide for how benefits can be assessed in the future, but they also show how data is already is being used in the present.  FPF Co-Chairman Christopher Wolf will discuss key points from these papers at the Federal Trade Commission public workshop entitled “Big Data: A Tool for Inclusion or Exclusion?” in Washington on Monday, September 15.

We are also releasing a White Paper which is based on comments that will be presented at the FTC Workshop by  Peter Swire, Nancy J. & Lawrence P. Huang Professor of Law and Ethics, Georgia Institute of Technology.  Swire, also Senior Fellow at FPF, draws lessons from fair lending law that are relevant for online marketing related to protected classes.

The papers are entitled:

*                      *                      *

The world of big data is messy and challenging. The very term “big data” means different things within different contexts. Any successful approach to the challenge of big data must recognize that data can be used in a variety of different ways. Some of these uses are clearly beneficial, some of them clearly are problematic, some are for uses that some believe beneficial and others believe to be harmful.  Some uses have no real impact on individuals at all. We hope these documents can offer new ways to look at big data in order to ensure that it is only being used for good.

Big Data: A Benefit and Risk Analysis

Privacy professionals have become experts at evaluating risk, but moving forward with big data will require rigorous analysis of project benefits to go along with traditional privacy risk assessments. We believe companies or researchers need tools that can help evaluate the cases for the benefits of significant new data uses.  Big Data: A Benefit and Risk Analysis is intended to help companies assess the “raw value” of new uses of big data. Particularly as data projects involve the use of health information or location data, more detailed benefit analyses that clearly identify the beneficiaries of a data project, its size and scope, and that take into account the probability of success and evolving community standards are needed.   We hope this guide will be a helpful tool to ensure that projects go through a process of careful consideration.

Identifying both benefits and risks is a concept grounded in existing law. For example, the Federal Trade Commission weighs the benefits to consumers when evaluating whether business practices are unfair or not. Similarly, the European Article 29 Data Protection Working Party has applied a balancing test to evaluate legitimacy of data processing under the European Data Protection Directive. Big data promises to be a challenging balancing act.

Big Data: A Tool for Fighting Discrimination and Empowering Groups

Even as big data uses are examined for evidence of facilitating unfair and unlawful discrimination, data can help to fight discrimination. It is already being used in myriad ways to protect and to empower vulnerable groups in society. In partnership with the Anti-Defamation League, FPF prepared a report that looked at how businesses, governments, and civil society organizations are leveraging data to provide access to job markets, to uncover discriminatory practices, and to develop new tools to improve education and provide public assistance.  Big Data: A Tool for Fighting Discrimination and Empowering Groups explains that although big data can introduce hidden biases into information, it can also help dispel existing biases that impair access to good jobs, good education, and opportunity.

Lessons from Fair Lending Law for Fair Marketing and Big Data

Where discrimination presents a real threat, big data need not necessary lead us to a new frontier. Existing laws, including the Equal Credit Opportunity Act and other fair lending laws, provide a number of protections that are relevant when big data is used for online marketing related to lending, housing, and employment. In comments to be presented at the FTC public workshop, Professor Peter Swire will discuss his work in progress entitled Lessons from Fair Lending Law for Fair Marketing and Big Data. Swire explains that fair lending laws already provide guidance as to how to approach discrimination that allegedly has an illegitimate, disparate impact on protected classes. Data actually plays an important role in being able to assess whether a disparate impact exists! Once a disparate impact is shown, the burden shifts to creditors to show their actions have a legitimate business need and that no less reasonable alternative exists. Fair lending enforcement has encouraged the development of rigorous compliance mechanisms, self-testing procedures, and a range of proactive measures by creditors.

*                      *                      *

There is no question that big data will require hard choices, but there are plenty of avenues for obtaining the benefits of big data while avoiding – or minimizing – any risks. We hope the following documents can help shift the conversation to a more nuanced and balanced analysis of the challenges at hand.

To contact the authors to discuss any of the papers or issues related to privacy and big data, contact

Big Data Papers

FPF releases documents on the benefits of big data to promote a more nuanced and balanced analysis of the big data challenges at hand.

Data Protection Law Errors in Google Spain LS, Google Inc. v. Agencia Espanola de Proteccion de Datos, Mario Costeja Gonzalez

The following is a guest post by Scott D. Goss, Senior Privacy Counsel, Qualcomm Incorporated, addressing the recent “Right to be Forgotten” decision by the European Court of Justice.

There has been quite a bit of discussion surrounding the European Court of Justice’s judgment in Google Spain LS, Google Inc. v. Agencia Espanola de Proteccion de Datos (AEPD), Mario Costeja Gonzalez.  In particular, some interesting perspectives have been shared by Daniel Solove, Ann Cavoukian and Christopher Wolf, and Martin Husovec.  The ruling has been so controversial, newly appointed EU Justice Commissioner, Martine Reicherts delivered a speech defending it.  I’d like to add to the discussion.[1]  Rather than focusing on the decision’s policy implications or on the practicalities of implementing the Court’s ruling, I’d like to instead offer thoughts on a few points of data protection law.

To start, I don’t think “right to be forgotten” is an apt description of the decision, and instead distorts the discussion.  Even if Google were to follow the Court’s ruling to the letter, the information doesn’t cease to exist on the Internet.  Rather, the implementation of the Court’s ruling just makes internet content linked to peoples’ names harder to find.  The ruling, therefore, could be thought of as, “the right to hide”.  Alternatively, the decision could be described as, “the right to force search engines to inaccurately generate results.”  I recognize that such a description doesn’t roll off the tongue quite so simply, but I’ll explain why that description is appropriate below.

I believe the Court made a few important legal errors that should be of interest to all businesses that process personal data.  First was the Court’s determination that Google was a “controller” as defined under EU data protection law and second was the application of the information relevance question.    Then, I’ll explain why “the right to force search engines to inaccurately generate results” may be a more appropriate description of the Court’s ruling.

1.       “Controller” status must be determined from the activity giving rise to the complaint

To understand how the Court erred in determining that Google is a “controller” in this case, it helps to understand how search engines work.  At a conceptual level, search is comprised of three primary data processing activities:  (i) caching all the available content, (ii) indexing the content, and (iii) ranking the content.  During the initial caching phase, a search engine’s robot minions scour the Internet noting all the content on the Internet and its location.  The cache can be copies of all or parts of the web pages on the Internet.  The cache is then indexed to enable much faster searching by sorting the content.  Indexing is important because without it searching would take immense computing power and significant time for each page of the Internet to be examined for users’ search queries.  Finally, the content within the index is ranked for relevance.

From a data processing perspective, I believe that caching and indexing achieve two objective goals: Determining the available content of the internet and where can it be found.  Tellingly, the only time web pages are not cached and indexed is when website publishers, not search engines, include a special code on their web pages instructing search engines to ignore the page.  This special code is called robots.txt

The web pages that are cached and indexed could be the text of the Gettysburg address, the biography of Dr. Martin Luther King, Jr., the secret recipe for Coca-Cola, or newspaper articles that include peoples’ names.  It is simply a fact that the letters comprising the name “Mario Costeja Gonzalez” could be found on certain web pages.  Search engines cannot control that fact any more than they could take a picture of the sky and be said to control the clouds in the picture.

After creating the cache and index, the next step involves ranking the content.  Search engine companies employ legions of the world’s best minds and immense resources to determine rank order.  Such ranking is subjective and takes judgment.  Arguably, ranking search results could be considered a “controller” activity, but the ranking of search results was never at issue in the Costeja Gonzalez case.  This is a key point underlying the Court’s errors.  Mr. Costeja Gonzalez’s complaint was not that Google ranked search results about him too high (i.e., Google’s search result ranking activity), but rather that the search engine indexed the information at all.  The appropriate question, therefore, is whether Google is the “controller” of the index.  The question of whether Google’s process of ranking search results confers “controller” status on Google is irrelevant.  The Court’s error was to conflate Google’s activity of ranking search results with its caching and indexing of the Internet.

Some may defend the Court by arguing that controller status of some activities automatically anoints controller status on all activities.  This would be error.  The Article 29 Working Party opined,

[T]he role of processor does not stem from the nature of an entity processing data but from its concrete activities in a specific context.  In other words, the same entity may act at the same time as a controller for certain processing operations and as a processor for others, and the qualification as controller or processor has to be assessed with regard to specific sets of data or operations.

Opinion 1/2010 on the concepts of “controller” and “processor”, page 25, emphasis added.  In this case, Mr. Costeja Gonzalez’s complaint focused on the presence of certain articles about him in the index.  Therefore, the “concrete activities in a specific context” is the act of creating the index and the “specific sets of data” is the index itself.  The Article 29 WP went on to give an example of an entity acting as both a controller and a processor of the same data set:

An ISP providing hosting services is in principle a processor for the personal data published online by its customers, who use this ISP for their website hosting and maintenance.  If however, the ISP further processes for its own purposes the data contained on the websites then it is the data controller with regard to that specific processing.

I submit that creation of the index is analogous to an ISP hosting service.  In creating an index, search engines create a copy of everything on the Internet, sort it, and identify its location.  These are objective, computational exercises, not activities where the personal data is noted as such and treated with some separate set of processing.  Following the Article 29 Working Party opinion, search engines could be considered processors in the caching and indexing of Internet content because such activities are mere objective and computational exercises, but controllers in the ranking of the content due to the subjective and independent analysis involved.

Further, as argued in the Opinion of Advocate General Jaaskinen, a controller needs to recognize that they are processing personal data and have some intention to process it as personal data. (See paragraph 82).  It is the web publishers who decide what content goes into the index.  Not only do they have discretion in deciding to publish the content on the Internet in the first instance, but they also have the ability to add the robots.txt code to their web pages which directs search engines to not cache and index.  The mark of a controller is one who “determines the purposes and means of the processing of personal data.” (Art. 2, Dir. 95/46 EC).  In creation of the index, rather than “determining”, search engines are identifying the activities of others (website publishers) and heeding their instructions (use or non-use of robots.txt).  I believe such processing cannot, as a matter of law, rise to the level of “controller” activities.

Finding Google to be a “controller” may have been correct if either the facts or the complaint had been different.  Had Mr. Costeja Gonzalez produced evidence that: (i) the web pages he wanted removed contained the “robots.txt” instruction or, (ii) the particular web pages were removed from the Internet by the publisher but not by Google in its search results, then it may be appropriate to hold Google as a “controller” due to these independent activities.  Such facts would be similar to the example given by the Article 29 Working Party of an ISP’s independent use of personal data maintained by its web hosting customers.  Similarly, had Mr. Costeja Gonzalez’s complaint been that search results regarding his prior bankruptcy been ranked too high, then I could understand (albeit I may still disagree) that Google would be found to be a controller.  But that was not his complaint.  His complaint was that certain information was included in the index at all – and for that, I believe, Google should have no more control over than it has in the content of the Internet itself.

2.       “Relevance” of Personal Data must be evaluated in light of the purpose of the processing.

The Court’s second error arose in the application of the controller’s obligations.  Interestingly, after finding that Google is the controller of the index, the Court incorrectly applied the relevancy question.  To be processed legitimately, personal data must be “relevant and not excessive in relation to the purposes for which they are collected and/or further processed.” Directive 95/46 EC, Article 6(c) (emphasis added). Relevancy is thus a question in relation to the purpose of the controller – not as to the data subject, a customer, or anyone else.  The purpose of the index, in Google’s own words, is to “organize the world’s information and make it universally accessible and useful.” (  With that purpose in mind, all information on the Internet is, by definition, relevant.  While clearly there are legal boundaries to the information that Google can make available, the issue is whether privacy law contains one of those boundaries.  I suggest that in the context of caching and creating an index of the Internet, it is not.

The court found that Google legitimizes its data processing under the legitimate interest test of Article 7(f) of the Directive.  Google’s legitimate interests must be balanced against the data subjects’ fundamental rights under Article 1(1).  Since Article 1(1) provides no guidance as to what those rights are (other than “fundamental”), the Court looks to subparagraph (a) of the first paragraph of Article 14.  That provision provides data subjects with a right to object to data processing of their personal data, but offers little guidance as to when controllers must oblige.  Specifically, it provides in cases of legitimate interest processing, a data subject may,

“object at any time on compelling legitimate grounds relating to his particular situation to the processing of data relating to him, save where otherwise provided by national legislation.  Where there is a justified objection, the processing instigated by the controller may no longer involve those data.”

What are those “compelling legitimate grounds” for a “justified objection”?  The Court relies on Article 12(b) “the rectification, erasure, or blocking of data the processing of which does not comply with the provisions of this Directive, in particular because of the incomplete or inaccurate nature of the data”.  It is here the Court erred.

The Court took the phrase “incomplete or inaccurate nature of the data” and erroneously applied it to the interests of the data subject.  Specifically, the Court held that the question is whether the search results were “incomplete or inaccurate” representations of the data subject as he/she exists today.  I submit that was not the intent of Article 12(b).  Rather, Article 12(b) was referring back to the same use of that phrase in Article 6 providing that:

“personal data must be: . . .(d) accurate and, where necessary, kept up to date; every reasonable step must be taken to ensure that data which are inaccurate or incomplete, having regard to the purposes for which they were collected or for which they are further processed, are erased or rectified.

The question is not whether the search results are “incomplete or inaccurate” representations of Mr. Costeja Gonzalez, but whether the search results are inaccurate as to the purpose of the processing.  The purpose of the processing is to copy, sort, and organize the information on the internet.   In this case, queries for the characters “Mario Costeja Gonzalez,” displayed articles that he admits were actually published on the Internet.  Such results, therefore, are by definition not incomplete or inaccurate as to the purpose of the data processing activity.  To put it simply, the Court applied the relevancy test to the wrong party (Mr. Costeja Gonzalez) as opposed to Google and the purpose of its index.

To explain by analogy, examine the same legal tests applied to a credit reporting agency.  A credit reporting analogy is helpful because it also has at least three parties involved in the transaction.  In the case of the search engine, those parties are the search engine, the data subject, and the end user conducting the search.  In the case of credit reporting, the three parties involved are the credit ratings businesses, the consumers who are rated (i.e., the data subjects), and the lenders and other institutions that purchase the reports.  It is well-established law that consumers can object to information used by credit ratings businesses as being outdated, irrelevant, or inaccurate.  The rationale for this right is found in Article 12(f) and Article 6 in relation to the purpose of the credit reporting processing activity.

The purpose of credit reporting is to provide lenders an opinion on the credit worthiness of the data subject.  The credit ratings business must take care that the information they use is not “inaccurate or incomplete” or they jeopardize the purpose of their data processing by generating an erroneous credit score.  For example, if a credit reporting agency collected information about consumers’ height or weight, consumers would be able to legitimately object.  Consumers’ objections would not be founded on the fact that the information is not representative of who they are – indeed such data may be completely accurate and current.  Instead, consumers’ objections would be founded on the fact that height or weight are not relevant for the purpose of assessing consumers’ credit worthiness.

Returning to the Costeja Gonzalez case, the issue was whether the index (not the ranking of such results) should include particular web pages containing the name of Mr. Costeja Gonzalez.  Since the Court previously determined that Google was the “controller” of the index (which I contend was error), the Court should have determined Google’s purpose of the index and then set the inquiry as to whether the contested web pages were incorrect, inadequate, irrelevant or excessive as to Google’s purpose.  As discussed above, Goolge’s professed goal is to enable the discovery of the world’s information and to that end the purpose of the index is to, as much as technologically possible, catalog the entire Internet – all the good, bad, and ugly.  For that purpose, any content on the Internet about the key words “Mario Costeja Gonzalez” is, by definition, not incorrect, inadequate, irrelevant or excessive because the goal is to index everything.  Instead, however, the Court erred by asking whether the web pages were incorrect, inadequate, irrelevant or excessive as to Mr. Costeja Gonzalez, the data subject.  Appling the relevancy question as to Mr. Costeja Gonzalez is, well, not relevant.

Some may argue that the Court recognized the purpose of the processing when making the relevancy determination by finding that Mr. Costeja Gonzalez’s rights must be balanced against the public’s right to know.  By including the public’s interest in the relevancy evaluation, some may argue, the Court has appropriately directed the relevancy inquiry to the right parties.  I disagree.  First, I do not believe it was appropriate to inquire as to the relevancy of the links vis-a-vis Mr. Costeja Gonzalez in the first instance and, therefore, to balance it against other interests (in this case the public) does not cure the error.  Secondly, to weigh the interests of the public, one must presume that the purpose of searching for individuals’ names is to obtain correct, relevant, not inadequate and not excessive information.  I do not believe such presumptions are well-founded.  For example, someone searching for “Scott Goss” may be searching for all current, relevant, and non-excessive information about me.  On the other hand, fifteen years from now perhaps someone is searching for all privacy articles written in 2014 and they happened to know that I wrote one and so searched using my name.  One cannot presume to know the purpose of an individual’s search query other than a desire to have access to all the information on the Internet containing the query term.

If not the search engines, where would it be pertinent to ask the question of whether information on the Internet about Mr. Costeja Gonzalez was incorrect, inadequate, irrelevant or excessive?  The answer to this question is clear:  it is the entities that have undertaken the purpose of publishing information about Mr. Costeja Gonzales.  Specifically, website publishers process personal data for the purpose of informing their readers about those individuals.  The website publishers, therefore, have the burden to ensure that such information is not incorrect, inadequate, irrelevant or excessive as to Mr. Costeja Gonzales.  That there may be an exception in data protection law for web publishers, does not mean that courts should be free to foist obligations onto search engines.

3.       The right to force search engines to inaccurately generate results

Finally, the “right to force search engines to inaccurately generate results” is, I believe, an apt description of the ruling.  A search engine’s cache and index is supposed to contain the entire web’s information that web publishers want the world to know.  Users expect that search engines will identify all information responsive to their queries when they search.  Users further expect that search engines will rank all the results based upon their determination of the relevancy of the results in relation to the query.  The Court’s ruling forces search engines to generate an incomplete list of search results by gathering all information relevant to the search and then pretending that certain information on the Internet doesn’t exist on the Internet at all.  The offending content is still on the Internet, people just cannot rely on finding the content using individuals’ names entered into search engines (at least the search engines on European country-coded domains).


[1] These thoughts are my own and not the company for which I work and I do not profess to be an expert in search technologies or the arguments made by the parties in the case.

FERPA | SHERPA: Providing a Guide to Education Privacy Issues

Education is changing. New technologies are allowing information to flow within classrooms, school, and beyond, enabling new learning environments and new tools to understanding and improve the way teachers teach and students learn. At the same time, however, the confluence of enhanced data collection with highly sensitive information children and teens also makes for a combustive mix from a privacy perspective. Even the White House recognizes this challenge! Its recent Big Data Review specifically highlighted the need for responsible innovation in education.

There are many organizations – many of which we’ve partnered with – working tirelessly to privacy issues in education and provide the best experience for students. So too is the Department of Education. Yet these resources are scattered. The need for an education privacy resource clearinghouse is clear. With “back to school” now in full-swing, we thought it a great time to launch FERPA|SHERPA. The site – named after the core federal law that governs education privacy – aims to provide a one-stop shop for education privacy-related offerings of interest to parents and schools, as well as education service providers and the policymakers struggling to grapple with the legal landscape.

Everyone in the educational ecosystem has a role to play here, lest legitimate privacy concerns combine with other worries to overwhelm the benefits of education technologies and the expanded use of student data. One need only look at the recent collapse of inBloom – a new technology platform that school systems were clamoring for until a combination of poor communication and privacy fears came to dominate any and all conversations about the underlying technologies – as an example of the need for schools and the companies they partner with to better address education privacy issues.

To ensure parents have a voice in the ongoing privacy debate, the site will also host a blog written by parent privacy advocate Olga Garcia-Kaplan, a Brooklyn, NY public school parent of three children.

Additionally, we’re also releasing an education privacy whitepaper by Jules Polonetsky, our executive director, and Omer Tene, Vice President, Research & Education, IAPP, that analyzes the opportunities and challenges of data-driven education technologies and how key stakeholders should address them. The piece – “The Ethics of Student Privacy: Building Trust for Ed Tech” – was recently published in a special issue of the International Review of Information Ethics, “The Digital Future of Education.”

We hope FERPA | SHERPA will help get everyone on the same page when it comes to privacy issues around student data.  We would love your feedback and thoughts on the new site, and we look forward to helping to jump start conversations about education privacy in the new school year.  If we’ve missed something or you’d like to join our effort, please reach out to

Privacy Calendar

all-day IAPP Privacy Academy and CSA Con... @ San Jose Convention Center
IAPP Privacy Academy and CSA Con... @ San Jose Convention Center
Sep 17 – Sep 19 all-day
This fall, the International Association of Privacy Professionals (IAPP) and Cloud Security Alliance (CSA) are bringing together the IAPP Privacy Academy and the CSA Congress under one roof, giving you access to even more valuable[...]
The NSA, Privacy and the Global ... @ Georgetown Law Center
The NSA, Privacy and the Global ... @ Georgetown Law Center
Sep 19 @ 1:15 pm – 2:45 pm
WHAT The NSA, Privacy and the Global Internet: Perspectives on Executive Order 12333 WHEN Friday, September 19, 2014 1:15 – 2:45 p.m. WHERE Georgetown University Law Center McDonough Hall, Room 200 600 New Jersey Avenue,[...]
Mapping Issues with the Web: An ... @ Tow Center for Digital Journalism/Columbia Journalism School
Mapping Issues with the Web: An ... @ Tow Center for Digital Journalism/Columbia Journalism School
Sep 23 @ 5:00 pm – 6:30 pm
On the occasion of Bruno Latour’s visit to Columbia University, this presentation will show participants how to operationalize his seminal Actor-Network Theory using digital data and methods in the service of social and cultural research.
Yale Day of Data @ Yale University
Yale Day of Data @ Yale University
Sep 26 @ 8:30 am – 5:00 pm
This day-long event will focus on data science and partnerships across industry, academia, and government initiatives. The day will also include presentations by eight Yale faculty and researchers on issues specific to research data management,[...]
City by Numbers: Big Data and th... @ Pratt Institute
City by Numbers: Big Data and th... @ Pratt Institute
Oct 11 @ 9:30 am – 6:00 pm
Big Data—the exponential growth and availability of information—is one of the defining phenomena of our time. It affects us all on different levels – with far-reaching social, environmental, and governmental significance. To help make sense[...]
Consumer Action’s 43rd Annual Aw... @ Google
Consumer Action’s 43rd Annual Aw... @ Google
Oct 21 @ 6:00 pm – Oct 21 @ 8:00 pm
To mark its 43rd anniversary, Consumer Action’s Annual Awards Reception on October 21, 2014, will celebrate the theme of “Train the Trainer.” Through the power of individual and small group trainings, Consumer Action each year is[...]

View Calendar