TY - GEN
T1 - Differentially private ordinary least squares
AU - Sheffet, Or
N1 - Publisher Copyright: Copyright © 2017 by the authors.
PY - 2017
Y1 - 2017
N2 - Linear regression is one of the most prevalent techniques in machine learning; however, it is also common to use linear regression for its explanatory capabilities rather than label prediction. Ordinary Least Squares (OLS) is often used in statistics to establish a correlation between an attribute (e.g. Gender) and a label (e.g. Income) in the presence of other (potentially correlated) features. OLS assumes a particular model that randomly generates the data, and derives t-values-representing the likelihood of each real value to be the true correlation. Using i-values, OLS can release a confidence interval, which is an interval on the reals that is likely to contain the true correlation; and when this interval does not intersect the origin, we can reject the null hypothesis as it is likely that the true correlation is non-zero. Our work aims at achieving similar guarantees on data under differentially private estimators. First, we show that for well-spread data, the Gaussian Johnson-Lindenstrauss Transform (JLT) gives a very good approximation of t-values; secondly, when JLT approximates Ridge regression (linear regression with/2-rcgularization) wc derive, under certain conditions, confidence intervals using the projected data; lastly, we derive, under different conditions, confidence intervals for the "Analyze Gauss" algorithm (Dwork et al., 2014).
AB - Linear regression is one of the most prevalent techniques in machine learning; however, it is also common to use linear regression for its explanatory capabilities rather than label prediction. Ordinary Least Squares (OLS) is often used in statistics to establish a correlation between an attribute (e.g. Gender) and a label (e.g. Income) in the presence of other (potentially correlated) features. OLS assumes a particular model that randomly generates the data, and derives t-values-representing the likelihood of each real value to be the true correlation. Using i-values, OLS can release a confidence interval, which is an interval on the reals that is likely to contain the true correlation; and when this interval does not intersect the origin, we can reject the null hypothesis as it is likely that the true correlation is non-zero. Our work aims at achieving similar guarantees on data under differentially private estimators. First, we show that for well-spread data, the Gaussian Johnson-Lindenstrauss Transform (JLT) gives a very good approximation of t-values; secondly, when JLT approximates Ridge regression (linear regression with/2-rcgularization) wc derive, under certain conditions, confidence intervals using the projected data; lastly, we derive, under different conditions, confidence intervals for the "Analyze Gauss" algorithm (Dwork et al., 2014).
UR - http://www.scopus.com/inward/record.url?scp=85048476423&partnerID=8YFLogxK
M3 - منشور من مؤتمر
T3 - 34th International Conference on Machine Learning, ICML 2017
SP - 4774
EP - 4801
BT - 34th International Conference on Machine Learning, ICML 2017
T2 - 34th International Conference on Machine Learning, ICML 2017
Y2 - 6 August 2017 through 11 August 2017
ER -