TY - GEN
T1 - On the Rival Nature of Data
T2 - 4th ACM Symposium on Computer Science and Law, CS and LAW 2025
AU - Gordon-Tapiero, Ayelet
AU - Ligett, Katrina
AU - Nissim, Kobbi
N1 - Publisher Copyright: © 2025 Copyright held by the owner/author(s).
PY - 2025/3/25
Y1 - 2025/3/25
N2 - Data is often thought of and treated as a non-rival good, which would imply that one person’s use of data does not inherently diminish its availability for others. Building on research in privacy and statistics, we argue that there exist many important settings in which data should be treated as a rival good. Our argument takes into account modern uses of data for statistics, machine learning, and a variety of other purposes, in conjunction with requirements of privacy protection and statistical validity. Excessive sharing or reuse of data about individuals can lead to leakage of sensitive personal information, potentially causing harm to those whose information is included in the data. Overuse of data in statistics or machine learning can lead to overfitting, i.e., models that perform well on training data but poorly on fresh unseen data. Recognizing the rival nature of data offers an opportunity to rethink the way data are managed and used. In an age where the training of AI models generates a massive appetite for data, this perspective has the potential to inform the creation of new regulation and technical infrastructure that will be able to safely and responsibly manage data, track their various uses, and ensure that privacy and statistical usefulness are respected and preserved. We observe that current EU regulation and existing approaches to science seeking to increase opportunities for data-sharing and reuse of data misconstrue the complex nature of data, inadvertently creating risks of privacy harms and overfitting, hence squandering the societal benefits that can be derived from data. Recognizing the rival nature of data has implications for policy and practice. Regulation should address the limitations and risks associated with data reuse and facilitate technological measures to track, analyze, and manage data usage with the goal of ensuring that privacy and statistical validity are maintained.
AB - Data is often thought of and treated as a non-rival good, which would imply that one person’s use of data does not inherently diminish its availability for others. Building on research in privacy and statistics, we argue that there exist many important settings in which data should be treated as a rival good. Our argument takes into account modern uses of data for statistics, machine learning, and a variety of other purposes, in conjunction with requirements of privacy protection and statistical validity. Excessive sharing or reuse of data about individuals can lead to leakage of sensitive personal information, potentially causing harm to those whose information is included in the data. Overuse of data in statistics or machine learning can lead to overfitting, i.e., models that perform well on training data but poorly on fresh unseen data. Recognizing the rival nature of data offers an opportunity to rethink the way data are managed and used. In an age where the training of AI models generates a massive appetite for data, this perspective has the potential to inform the creation of new regulation and technical infrastructure that will be able to safely and responsibly manage data, track their various uses, and ensure that privacy and statistical usefulness are respected and preserved. We observe that current EU regulation and existing approaches to science seeking to increase opportunities for data-sharing and reuse of data misconstrue the complex nature of data, inadvertently creating risks of privacy harms and overfitting, hence squandering the societal benefits that can be derived from data. Recognizing the rival nature of data has implications for policy and practice. Regulation should address the limitations and risks associated with data reuse and facilitate technological measures to track, analyze, and manage data usage with the goal of ensuring that privacy and statistical validity are maintained.
KW - Data Governance Act (DGA)
KW - data
KW - privacy
KW - privacy budget
KW - statistical validity
UR - http://www.scopus.com/inward/record.url?scp=105001924098&partnerID=8YFLogxK
U2 - 10.1145/3709025.3712211
DO - 10.1145/3709025.3712211
M3 - منشور من مؤتمر
T3 - CS and LAW 2025 - Proceedings of the 2025 Symposium on Computer Science and Law
SP - 17
EP - 25
BT - CS and LAW 2025 - Proceedings of the 2025 Symposium on Computer Science and Law
Y2 - 25 March 2025 through 27 March 2025
ER -