TY - GEN
T1 - Coresets for differentially private k-means clustering and applications to privacy in mobile sensor networks
AU - Feldman, Dan
AU - Xiang, Chongyuan
AU - Zhu, Ruihao
AU - Rus, Daniela
N1 - Funding Information: This research was conducted in the Distributed Robotics Laboratory at CSAIL, MIT in collaboration with the University of Haifa. Support for this work has been provided in part by the NSF grant CNS1526815 and BSF grant 2014627. We are grateful for this support.
PY - 2017/4/18
Y1 - 2017/4/18
N2 - Mobile sensor networks are a great source of data. By collecting data with mobile sensor nodes from individuals in a user community, e.g. using their smartphones, we can learn global information such as traffic congestion patterns in the city, location of key community facilities, and locations of gathering places. Can we publish and run queries on mobile sensor network databases without disclosing information about individual nodes? Differential privacy is a strong notion of privacy which guarantees that very little will be learned about individual records in the database, no matter what the attackers already know or wish to learn. Still, there is no practical system applying differential privacy algorithms for clustering points on real databases. This paper describes the construction of small coresets for computing k-means clustering of a set of points while preserving differential privacy. As a result, we give the first k-means clustering algorithm that is both differentially private, and has an approximation error that depends sub-linearly on the data's dimension d. Previous results introduced errors that are exponential in d. We implemented this algorithm and used it to create differentially private location data from GPS tracks. Specifically our algorithm allows clustering GPS databases generated from mobile nodes, while letting the user control the introduced noise due to privacy. We provide experimental results for the system and algorithms, and compare them to existing techniques. To the best of our knowledge, this is the first practical system that enables differentially private clustering on real data.
AB - Mobile sensor networks are a great source of data. By collecting data with mobile sensor nodes from individuals in a user community, e.g. using their smartphones, we can learn global information such as traffic congestion patterns in the city, location of key community facilities, and locations of gathering places. Can we publish and run queries on mobile sensor network databases without disclosing information about individual nodes? Differential privacy is a strong notion of privacy which guarantees that very little will be learned about individual records in the database, no matter what the attackers already know or wish to learn. Still, there is no practical system applying differential privacy algorithms for clustering points on real databases. This paper describes the construction of small coresets for computing k-means clustering of a set of points while preserving differential privacy. As a result, we give the first k-means clustering algorithm that is both differentially private, and has an approximation error that depends sub-linearly on the data's dimension d. Previous results introduced errors that are exponential in d. We implemented this algorithm and used it to create differentially private location data from GPS tracks. Specifically our algorithm allows clustering GPS databases generated from mobile nodes, while letting the user control the introduced noise due to privacy. We provide experimental results for the system and algorithms, and compare them to existing techniques. To the best of our knowledge, this is the first practical system that enables differentially private clustering on real data.
KW - Coresets
KW - Differential privacy
KW - Mobile sensor networks
UR - http://www.scopus.com/inward/record.url?scp=85019041898&partnerID=8YFLogxK
U2 - https://doi.org/10.1145/3055031.3055090
DO - https://doi.org/10.1145/3055031.3055090
M3 - Conference contribution
T3 - Proceedings - 2017 16th ACM/IEEE International Conference on Information Processing in Sensor Networks, IPSN 2017
SP - 3
EP - 15
BT - Proceedings - 2017 16th ACM/IEEE International Conference on Information Processing in Sensor Networks, IPSN 2017
T2 - 16th ACM/IEEE International Conference on Information Processing in Sensor Networks, IPSN 2017
Y2 - 18 April 2017 through 20 April 2017
ER -