In the U.S. public school system, data privacy is a myth

Despite policies that colleges allege provide both “transparency and access”, most U.S. college students and their parents are unaware of what data is being stored by colleges and universities, and who has access to that data.

But it is worse at the U.S. public school system level, and that is where all the trouble begins.

 

19 October 2022 – It is no mystery that U.S. university networks are rife with private information extremely attractive to cyber thieves, especially considering all of the personal, academic and financial information they are tasked to protect. Because of this, as I wrote last year, data breaches such as the Accellion file transfer software breach will continue. That breach leaked 1000s of files of sensitive information from such universities as Stanford University, the University of Maryland, the University of Miami, almost all branches of the University of California, the University of Colorado; and Yeshiva University. Universities got a sudden “wake-up-call” on understanding that different forms of stored personally identifiable information (PII) and the different security tactics needed to best secure the data.

It was a “Data Protection 101” on learning effective university PII protection to address three critical requirements faced by corporations (and other entities): data discovery, access governance and risk mitigation.

But it glossed over the bigger issue: the near ubiquitous use of learning platforms at the lower levels, in almost every U.S. public school and classroom, with the almost indiscriminate collection of any and all data – without practical ways to opt out of engagement with companies whose data collection policies and practices put students’ privacy at risk.

Educational software is pervasive in contemporary high school classrooms, yet many students are unaware that learning platforms monitor their activities, reporting concerning documents and messages to school administrators and law enforcement. The scope of student data collection and analysis raises challenges for existing privacy laws like the Family Educational Rights and Privacy Act which are almost ignored. None of the current privacy legislation in any of the U.S. states deals with it, and it triggers all sorts of fundamental philosophical questions about the autonomy and rights of children which none of this legislation addresses.

Last spring, and continuing this fall, the MIT Media Lab has run a series of sessions on all of these educational issues (at both the collegiate and public schoo levels)l, part of a long running series on all of the opportunities and challenges of the computing age. Some of the sessions have been run by Kathleen Creel, a bit of an all-knowing maven on computing, data, and society. You can read about her by clicking here. I had an opportunity to attend many of the spring sessions, and one of my team members is attending the fall sessions.

For this post, I’ll focus on the U.S. public school system, referencing some of the session materials, plus some related work by Kathleen Creel. The material is voluminous so everything cannot be included, but I will try to keep to the session outlines to ease your read.

1. Introduction

As noted in the opening session, for most high school students in the United States, classwork and homework begin the same way: by opening a laptop and logging into a learning platform. Students spend hours of their days composing essays, practicing math problems, emailing teachers, and taking tests online. At schools with policies governing in-class phone use, students message friends on school-provided devices that come to feel like their own. But many do not realize that schools and third-party companies can collect their personal communications as data and, in some cases, as evidence.

The schools mantra has often been they want to protect students from cyberbullying peers and from harming themselves or others. To address these issues, administrators install monitoring software. A single word in an email, instant message, or search bar indicating bullying (“gay”) or self-harm (“suicide”) can trigger an alert to school administrators and law enforcement. In September 2021, monitoring software in a Minneapolis school district sent over 1,300 alerts to school administrators stating that students were viewing “questionable content.” Each alert extracted flagged text from students’ messages, emails, or school documents and sent it to school administrators for review. In other districts, monitoring software escalated students’ mentions of suicide to police.

Yet a national survey of parents of school-aged children found that a majority of parents did not want their child’s educational data to be shared with law enforcement. And despite widespread, and increasing, parental concern over how and with whom student data is shared, with near ubiquitous use of learning platforms in the classroom, students and parents are left without practical ways to opt out of engagement with companies whose data collection policies and practices put students’ privacy at risk.

But in addition to protecting students, education researchers and makers of education software also say they want to use the rich trove of data students generate to improve students’ education. Artificial intelligence (AI)-based educational technology aims to “meet students where they are” by using the student’s data to track progress and create personalized learning plans. Makers of educational software hope to understand how students are interacting with software in order to improve future offerings, while education researchers hope to use student data to understand learning and educational disparities.

But data collection always comes with costs. In 2016, the Electronic Frontier Foundation studied 118 education technology software services and apps commonly used by schools. 78 retained data after students had graduated; only 48 encrypted personal information about students. Some school districts require educational software providers to disclose their data collection practices to parents and allow them to opt their child out, yet only 55 percent of parents surveyed had received a disclosure and 32 percent said that they were unable to opt out. This lack of data encryption, transparency, and choice is concerning. Despite policies that aim to provide both transparency and access, most students and parents are unaware of what data is being stored and who has access to it.

As adults, we are routinely subjected to similar invasions. Yet the special moral status of children makes the tension between their protection and education and the protection of their privacy especially fraught.

On one model of childhood privacy, the paternalism view, students are children to be protected and educated by responsible adults who make decisions on their behalf. Parents and guardians may succeed in maintaining their child’s privacy from others by using laws like the Family Educational Privacy Rights Act (FERPA) to shield their child’s student records. However, the child has no privacy from her guardians and educators: parents and school administrators may read her social-emotional–learning diaries and text messages at any time. The child is a “moral patient”: a being deserving of protection, but lacking the agency to make their own choices without oversight. Protecting her from a chance of harm from herself or others is worthwhile, no matter how small the chance of harm, because there is no countervailing duty to protect her privacy.

On another view, the essential role of the child is not to be protected but to “become herself.” Through exploration and play, the child develops her own agency, autonomy, and ability to freely choose. To the extent that a child is aware of being monitored in her experimentation and aware that her decisions and explorations may be interpreted out of context and punished accordingly, she cannot explore freely. The child also needs privacy in order to develop herself in relation to others. Respecting the privacy of communications between children allows them to develop genuine friendships, as friendship is a relationship between two people that cannot survive constant surveillance by a third party.1

There is, of course, a big issue here and beyond the scope of these sessions, or this post. Yes, parents do not want a system that triggers false alarms that send law enforcement knocking, and thereby putting their children into a database for future use, nor one that knowingly allows learning platforms to resell identifiable student data to commercial vendors. The broader dilemma is knowing when to treat children “as children” and when to treat them as responsible agents is – the essential predicament of childhood. As Kathleen Creel noted “We cannot wait for adult data privacy to be settled before tackling it”.

2. Privacy and Contextual Integrity

Before the use of computers in education, students communicated primarily in person, which prevented most third parties from knowing the contents of their conversations, and did their schoolwork on paper or the ephemeral chalkboard. What data they did produce was confined to paper records locked in filing cabinets. Such data was difficult to aggregate: if shared at all, it was mimeographed and mailed or read aloud over the telephone. A third party aspiring to collect data on students would be stymied by the friction of acquiring the data and the expense of storing it. But as one of the presenters noted:

Today, students generate electronic data every time they send email or messages, take quizzes and tests on learning platforms, and record themselves for class projects. The volume, velocity, and variety of the data they generate has shifted dramatically. As cloud computing prices fall, analytics companies become incentivized to collect and store as much data as possible for their future use. Educational software and websites are no exception. Companies analyze and mine data about students’ day-to-day activities to find useful patterns for their own purposes, such as product improvement or targeted advertising.

So without additional legal protection (and it certainly seems no where in sight), the trend toward increasing student data collection, storage, and reuse is likely to continue. Given the contested moral status of children, how should their privacy and persons be protected? What policies or laws should we adopt?

And so we went down the rabbit hole. In order to choose one policy over another, we need first a reason for the choice – a justification of why a certain norm of privacy is correct based on a broader justificatory framework. We were introduced to the work of Helen Nissenbaum (a professor of information science at Cornell Tech best known for her work on privacy, privacy law, trust, and security in the online world) and her analysis of privacy which she calls “contextual integrity”.

2.1. Contextual Integrity

Contextual integrity suggests that every social context, from the realm of politics to the dentist’s office, is governed by “norms of information flow.” Each social context is governed by different expectations regarding what people in different roles should do or say. For example, it would be appropriate for a dentist to ask a patient his age but unusual for a patient to reverse the question. In addition to norms of appropriateness, each social context has norms governing the proper flow or distribution of information. Thus, privacy is defined as the “appropriate flow of information” in a context, not as secrecy or lack of information flow.

According to Nissenbaum, there are five parameters that define the privacy norms in a context:

1. Subject

2. Sender

3. Recipient

4. Information Type, and

5. Transmission Principle

Any digression from norms typical for a context constitutes a violation of contextual integrity. However, the principle is not fully conservative: new practices can be evaluated in terms of their effects on “justice, fairness, equality, social hierarchy, democracy,” and autonomy as well as their contribution to achieving the goals relevant for the context. On these grounds, the new privacy norm may be chosen above the old.

So the presenters used the contextual integrity model to evaluate reliance on educational software in schools.

NOTE TO READERS: Any learning platform can be analyzed in terms of the appropriateness of its privacy policy to the privacy norms of the classroom. Consider Gaggle’s privacy policy. Gaggle, an online platform designed for use in the classroom that seeks to replace communication tools such as blogging software and email clients with similar software equipped with content filters, states that, “Gaggle will not distribute to third parties any staff data or student data without the consent of either a parent/guardian or a qualified educational institution except in cases of Possible Student Situations (PSS), which may be reported to law enforcement.”

We were presented with this:

Imagine that a student sent a message to another student at 8 p.m. on a Saturday and Gaggle flagged it as a potential indicator that the student is depressed. Analyzing this scenario according to the contextual integrity norm, the five parameters would be:

Subject: Student 1 (the author of the message) and Student 1’s mental health concerns

Sender: Student 1 (the author of the message)

Recipient: Student 2 and Gaggle. If Gaggle alerts Student 1’s parents, school administration, or law enforcement of student activity, they also become recipients.

Information Type: Student data (in this case, a message and its associated metadata, such as sender, recipient, timestamp, and location)

Transmission Principle: The recipient will not share the student data with third parties without the consent of the parent/guardian or educational institution, except in cases of Possible Student Situations (PSS).

The desire to protect students and intervene to help them when they struggle with depression or anxiety is laudable and may initially appear to justify Gaggle’s new norms for classroom communication. However, the context of childhood friendship before the introduction of digital messaging was one in which it was possible for a student to discuss their feelings of sadness with another student on a weekend, outside of the classroom, without being overheard and without school intervention. Whether in person or on the telephone, the context of the interaction between Sender and Recipient was one of friendship mediated by the transmission principle of the telephone, which permitted information flow without disclosure to a third party. Pre-Gaggle messaging also presumes a disclosure-free channel for communication between friends.

Given these changes, the introduction of Gaggle meaningfully alters both the transmission principle and the set of recipients, thereby violating the preexisting privacy norms. In the contextual integrity framework, in order to argue that the new privacy norm is beneficial, proponents could appeal to its positive contribution to “justice, fairness, equality, social hierarchy, democracy,” or autonomy. If the new privacy norms do not contribute to these goods, they must contribute instead to the goals of the context. In order to evaluate whether the privacy norms reshaped by Gaggle are beneficial, we must determine what goals, and whose goals, should be considered relevant to the context.

3. Student Privacy Laws

The foregoing analysis assumes that the relevant context of communications between children who are friends are those of friendship: namely, that the norms of privacy that apply to adult friendships and adult message communications should also apply to childhood friendships and childhood message communications. However, if the first, guardian-centered viewpoint presented above is correct, it may be that the relevant context of analysis is primarily that the sender and recipient are both children, not that they are friends. Legally, that is the case: in the United States, a child has no right to privacy from their parents. Parents may monitor the online communications of children as they please.

Privacy from school officials or other adults is dependent on school policy and on the wishes of the parents or guardian until the child reaches the age of 18. There are three primary federal laws in the United States that aim to protect student privacy and security: the Family Educational Rights and Privacy Act (FERPA), the Children’s Online Privacy Protection Act (COPPA), and the Protection of Pupil Rights Amendment (PPRA). Although each attempted to meet the information security needs of its day, the three collectively fall short in protecting student privacy from contemporary data collection. From the materials:

FERPA provides the strongest federal privacy protection for student data, but has not been updated in the past decade.35 FERPA gives parents three rights: the right to access their child’s education records, the right to a hearing in which they may challenge inaccuracies in those records, and the right to prevent personally identifiable information (PII) from their child’s record from being disclosed to third parties without their written consent.

FERPA typically allows school officials to share student data only if both direct identifiers, such as student names, and indirect identifiers, such as a student’s date or place of birth, are removed. However, school officials may access PII data provided they have a “legitimate educational interest” in doing so. This privilege is extended at the school’s discretion to educational technology providers who take over roles previously performed by school officials, are under “direct control” of the district, and agree to be bound by the same provisions against reuse and reselling as a school official would be. Since most educational technology providers fall under these provisions, they are permitted to collect and store personally identifiable information about students.

Other laws similarly allow third-party software providers to collect data without requiring transparency as to the uses of that data. COPPA protects the personal information of children under the age of thirteen by requiring “operators of websites and online services” to gain parental consent in order to “collect, use, or disclose” the PII of their children. However schools “may act as the parent’s agent” and can allow online software providers to collect data for students under the age of thirteen and use it for educational purposes without parental consent. For K-12 students, PPRA requires parental consent before a school may require that students respond to a survey about topics such as their political affiliation, religious practices, or income.

But as pointed out: the patchwork of federal student privacy laws, supplemented by state legislation such as California’s Student Online Personal Information Protection Act (SOPIPA), has changed little in the last decade. And so much of the legislation contradicts itself and is inconsistent, and often incompatible.

But as the social practices to which the laws are applied changes – as student data becomes more voluminous, its transmission becomes easier, and as educational software providers take on activities once performed by school officials – the informational norms that privacy laws sought to protect are violated.

The contextual integrity framework helps us understand why this is. From the materials (highlighted text is my addition):

Even if the prevailing social context (a high school), the subject of the data (students), sender (school officials), many of the recipients (school officials, school districts), and the laws governing transmission remain the same, the addition of recipients such as the providers of educational software and the changes in principle of transmission (paper to email, or email to centralized learning platform) generates a violation of contextual integrity and therefore of privacy.

In order to illustrate how a change in educational technology can violate contextual integrity without violating FERPA, consider the case of InBloom. InBloom was an ambitious nonprofit initiative launched in 2011 to improve the US education system by creating a centralized, standardized, and open source student data-sharing platform for learning materials. Educators and software professionals from several states started building the platform. Although the platform complied with FERPA, it meaningfully changed the transmission principle under which student data was transmitted. Before, student data had been stored only locally, at the school level, and in the fragmented databases of educational technology providers.

Now it would be pooled at the state level and national levels, granting both school officials and their authorized educational software providers access to a much larger and more integrated database of student data, including personally identifiable information and school records. This is a violation of the contextual integrity of student data, and the backlash InBloom faced from parents, activists, and local school officials was on the grounds of the changes it would prompt in the transmission and storage of data. The InBloom incident highlights the need for updated student privacy legislation, and perhaps for legislation that incorporates principles of contextual integrity.

While InBloom shut down in 2014, many of the parent and activist criticisms of its data pooling would apply equally to the for-profit educational technology companies that continue to collect and store data in its absence. Oh, and the InBloom database? Bought my another software provider.

4. Privacy and Justice

Another factor relevant for the evaluation of contextual integrity is the social identities of the actors involved and how they interact with the roles they inhabit. Student privacy concerns can be compounded when combined with existing biases and discrimination, including those based in class, sexuality, race, and documentation status. According to a study by the Center for Democracy & Technology, low-income students are disproportionately subjected to digital surveillance. This is because many schools distribute laptops and other devices to low-income students, a side effect of which is increased access to student online activity. In some cases, school officials can view the applications a student opens and their browsing history in real-time.

NOTE TO READERS: Privacy concerns were exacerbated with virtual learning, as schools expanded laptop distribution significantly during the COVID-19 pandemic and all strata of data were collected..

Learning platforms may exacerbate existing systemic racism and bias in school-based disciplinary measures as Black and Hispanic student suspensions, expulsions, or arrests have been greater than for White students for similar offenses. Existing teacher biases, often associated with school-based disciplinary actions, may be embedded into AI-based education software, resulting in adverse impacts on marginalized students.From the materials:

Researchers at the University of Michigan studied the sociotechnical consequences of using ClassDojo, a data-driven behavior management software for K-8 students. ClassDojo allows teachers to add or subtract “Dojo points” from students for behaviors such as “helping others” or “being disrespectful.” The researchers found that use of ClassDojo had the potential to reinforce teacher biases, as when teacher stereotypes about which students were likely to be more “disrespectful” or “disruptive” were seemingly substantiated by ClassDojo behavior records gathered by the teachers themselves, and also found that ClassDojo had adverse psychological effects on students.

5. Addressing Privacy in Context

In addition to the planned and routine violations of contextual integrity that may occur when educational software providers resell supposedly anonymized data or school officials aggregate student data across contexts, there are the accidents. Large databases, or “data lakes” as they are somevocatively called, are prone to rupture and spill. The US Government Accountability Office analyzed 99 data breaches across 287 school districts, from July 2016 to May 2020. According to their report, thousands of students had their academic records and PII compromised. Bigger and more affluent school districts that used more technology and collected more student data were impacted most. The report states that compromised PII like social security numbers, names, addresses, and birth dates can be sold on the black market, causing financial harm to students who have otherwise clean credit histories. Compromised records containing special educational status or the medical records of students with disabilities who are on an Individualized Education Program (IEP) can lead to social and emotional harm. From the materials:

The government’s awareness of the frequency of data leaks and their negative consequences for student privacy, financial well-being, and medical confidentiality establishes a context for legislative solutions. Many data leaks flow from third-party educational software providers who are not following information security best practices. For example, nearly 820,000 students’ personal data from the New York City public school system was compromised in early 2022. Before the leak, the school district was unaware that the educational software provider responsible had failed to take basic measures such as encrypting all student data.

In addition to encryption, other best practices include anonymizing data when possible, including using techniques like differential privacy. Although anonymizing data by removing direct identifiers of students provides some measures of privacy, Latanya Sweeney’s work has shown that when linked to other data sources, it can be possible to reverse-engineer the anonymized data, which is known as linkage attacks. Most individual Americans can be identified by their “anonymized” Facebook accounts or their health care information.

To protect against linkage attacks and further ensure privacy, differential privacy is a technique that introduces statistical “noise” in sensitive data by slightly modifying select data fields in order to ensure that individuals are not identifiable. Differentially private individual data may be inaccurate; however, the aggregate results remain fairly accurate. Even if the data set is accessed, individuals’ privacy will be less likely to be compromised.

There was a detailed presentation on differential privacy and how it is used by researchers to secure their data, within companies that hold PII, and by the US Census Bureau. The statistical algorithms to implement differential privacy are most effective on large data sets as the utility of small data sets decreases due to noise. Thus, school systems with large and sensitive data sets may increase privacy with this technology.

NOTE TO READERS: Differential privacy has not been widely adopted within educational technology. In addition to the complexity of implementing differential privacy compared to data anonymization, many educational technology systems need the ability to identify specific students and track their progress. Implementing differential privacy internally, as opposed to when releasing data to researchers, could impede these pedagogical functions.

But, as we all know, data stored long enough is likely to leak. Introducing requirements that stored student data be encrypted and anonymized would protect data subjects from reidentification when it does. But not all student data can be anonymized and still remain useful. For data that cannot, one proposed solution is the establishment of “information fiduciaries.” A fiduciary is a representative who is required to make decisions in the best interests of their clients. Stipulating that schools and the educational software providers with whom they contract must act as fiduciaries when they handle student data would confer additional legal obligations to act in the best interests of the students and their privacy.

6. Conclusion

And so we are left with the conclusion that educational software will continue to be adopted by school systems simply for its overall perceived value in increasing access to high-quality and personalized education. As a result, student privacy issues will escalate, and not be resolved. Current federal privacy laws such as FERPA require updating in order to meet these challenges, as they do not hold school districts or educational software providers to the government’s own standards of student privacy and information security. But as our concluding speaker noted:

“Ensuring that school officials and educational software providers respect the contextual integrity of information transmission for student data, or adopt policies that represent a student-centric rather than guardian-centric perspective on children’s rights and privacy, requires more than a simple update. Given the vast amount of money going to these software vendors, and the symbiotic relation between those vendors and the school systems (because it reduces the teaching burden) and the claim that ‘technology can save education’, I think privacy concerns will continue to fly out the window”.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

scroll to top