On the Rival Nature of Data: Tech and Policy Implications

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Data is often thought of and treated as a non-rival good, which would imply that one person’s use of data does not inherently diminish its availability for others. Building on research in privacy and statistics, we argue that there exist many important settings in which data should be treated as a rival good. Our argument takes into account modern uses of data for statistics, machine learning, and a variety of other purposes, in conjunction with requirements of privacy protection and statistical validity. Excessive sharing or reuse of data about individuals can lead to leakage of sensitive personal information, potentially causing harm to those whose information is included in the data. Overuse of data in statistics or machine learning can lead to overfitting, i.e., models that perform well on training data but poorly on fresh unseen data. Recognizing the rival nature of data offers an opportunity to rethink the way data are managed and used. In an age where the training of AI models generates a massive appetite for data, this perspective has the potential to inform the creation of new regulation and technical infrastructure that will be able to safely and responsibly manage data, track their various uses, and ensure that privacy and statistical usefulness are respected and preserved. We observe that current EU regulation and existing approaches to science seeking to increase opportunities for data-sharing and reuse of data misconstrue the complex nature of data, inadvertently creating risks of privacy harms and overfitting, hence squandering the societal benefits that can be derived from data. Recognizing the rival nature of data has implications for policy and practice. Regulation should address the limitations and risks associated with data reuse and facilitate technological measures to track, analyze, and manage data usage with the goal of ensuring that privacy and statistical validity are maintained.

Original languageEnglish
Title of host publicationCS and LAW 2025 - Proceedings of the 2025 Symposium on Computer Science and Law
Pages17-25
Number of pages9
ISBN (Electronic)9798400714214
DOIs
StatePublished - 25 Mar 2025
Event4th ACM Symposium on Computer Science and Law, CS and LAW 2025 - Munchen, Germany
Duration: 25 Mar 202527 Mar 2025

Publication series

NameCS and LAW 2025 - Proceedings of the 2025 Symposium on Computer Science and Law

Conference

Conference4th ACM Symposium on Computer Science and Law, CS and LAW 2025
Country/TerritoryGermany
CityMunchen
Period25/03/2527/03/25

Keywords

  • Data Governance Act (DGA)
  • data
  • privacy
  • privacy budget
  • statistical validity

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • Law

Fingerprint

Dive into the research topics of 'On the Rival Nature of Data: Tech and Policy Implications'. Together they form a unique fingerprint.

Cite this