TY - GEN
T1 - Optimal linear imputation with a convergence guarantee
AU - Resheff, Yehezkel S.
AU - Weinshall, Daphna
N1 - Publisher Copyright: © Springer International Publishing AG, part of Springer Nature 2018.
PY - 2018
Y1 - 2018
N2 - It is a common occurrence in the field of data science that real-world datasets, especially when they are high dimensional, contain missing entries. Since most machine learning, data analysis, and statistical methods are not able to handle missing values gracefully, these must be filled in prior to the application of these methods. It is no surprise therefore that there has been a long standing interest in methods for imputation of missing values. One recent, popular, and effective approach, the IRMI stepwise regression imputation method, models each feature as a linear combination of all other features. A linear regression model is then computed for each real-valued feature on the basis of all other features in the dataset, and subsequent predictions are used as imputation values. However, the proposed iterative formulation lacks a convergence guarantee. Here we propose a closely related method, stated as a single optimization problem, and a block coordinate-descent solution which is guaranteed to converge to a local minimum. Experiment results on both synthetic and benchmark datasets are comparable to the results of the IRMI method whenever it converges. However, while in the set of experiments described here IRMI often diverges, the performance of our method is shown to be markedly superior in comparison to other methods.
AB - It is a common occurrence in the field of data science that real-world datasets, especially when they are high dimensional, contain missing entries. Since most machine learning, data analysis, and statistical methods are not able to handle missing values gracefully, these must be filled in prior to the application of these methods. It is no surprise therefore that there has been a long standing interest in methods for imputation of missing values. One recent, popular, and effective approach, the IRMI stepwise regression imputation method, models each feature as a linear combination of all other features. A linear regression model is then computed for each real-valued feature on the basis of all other features in the dataset, and subsequent predictions are used as imputation values. However, the proposed iterative formulation lacks a convergence guarantee. Here we propose a closely related method, stated as a single optimization problem, and a block coordinate-descent solution which is guaranteed to converge to a local minimum. Experiment results on both synthetic and benchmark datasets are comparable to the results of the IRMI method whenever it converges. However, while in the set of experiments described here IRMI often diverges, the performance of our method is shown to be markedly superior in comparison to other methods.
UR - http://www.scopus.com/inward/record.url?scp=85048986245&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-93647-5_4
DO - 10.1007/978-3-319-93647-5_4
M3 - منشور من مؤتمر
SN - 9783319936468
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 62
EP - 79
BT - Pattern Recognition Applications and Methods - 6th International Conference, ICPRAM 2017, Revised Selected Papers
A2 - Fred, Ana
A2 - De Marsico, Maria
A2 - di Baja, Gabriella Sanniti
PB - Springer Verlag
T2 - 6th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2017
Y2 - 24 February 2017 through 26 February 2017
ER -