TY - GEN
T1 - Fixing wikipedia interlinks using revision history patterns
AU - Milo, Tova
AU - Novgorodov, Slava
AU - Razmadze, Kathy
N1 - Publisher Copyright: © 2021 Copyright held by the owner/author(s).
PY - 2021
Y1 - 2021
N2 - Wikipedia, the web-based free content encyclopedia project, is one of the most popular websites on the Web. Its “open-door" policy, allowing anyone to edit, has made Wikipedia the largest and possibly the best encyclopedia in the world. At the same time, the continuously evolving content, constantly updated by a large number of uncoordinated users, renders the maintenance of a clean, consistent encyclopedia an extremely challenging task. The goal of the WICLEAN (WC) system presented in this paper is to assist Wikipedia editors in this difficult task. Specifically, we focus on the correctness of Wikipedia inter-links that point from one article (entity) to another. Such inter-links form a key component of the structured part of Wikipedia and their correctness is critical for coherent browsing. Given an entity type of interest, our highly parallelizable algorithm identifies relevant edit patterns across revision histories of Wikipedia entities of related types, along with time windows in which partial edits are tolerable. The discovered patterns/windows are then used by WC to alert Wikipedia editors on past edits that appear to be incomplete, as well as to provide users with on-line assistance as they update the encyclopedia. Our experiments with real-life Wikipedia data demonstrate the efficiency and effectiveness of WC in identifying actual errors in a variety of Wikipedia entity types.
AB - Wikipedia, the web-based free content encyclopedia project, is one of the most popular websites on the Web. Its “open-door" policy, allowing anyone to edit, has made Wikipedia the largest and possibly the best encyclopedia in the world. At the same time, the continuously evolving content, constantly updated by a large number of uncoordinated users, renders the maintenance of a clean, consistent encyclopedia an extremely challenging task. The goal of the WICLEAN (WC) system presented in this paper is to assist Wikipedia editors in this difficult task. Specifically, we focus on the correctness of Wikipedia inter-links that point from one article (entity) to another. Such inter-links form a key component of the structured part of Wikipedia and their correctness is critical for coherent browsing. Given an entity type of interest, our highly parallelizable algorithm identifies relevant edit patterns across revision histories of Wikipedia entities of related types, along with time windows in which partial edits are tolerable. The discovered patterns/windows are then used by WC to alert Wikipedia editors on past edits that appear to be incomplete, as well as to provide users with on-line assistance as they update the encyclopedia. Our experiments with real-life Wikipedia data demonstrate the efficiency and effectiveness of WC in identifying actual errors in a variety of Wikipedia entity types.
UR - http://www.scopus.com/inward/record.url?scp=85113714894&partnerID=8YFLogxK
U2 - 10.5441/002/edbt.2021.06
DO - 10.5441/002/edbt.2021.06
M3 - منشور من مؤتمر
T3 - Advances in Database Technology - EDBT
SP - 49
EP - 60
BT - Advances in Database Technology - EDBT 2021
A2 - Velegrakis, Yannis
A2 - Zeinalipour, Demetris
A2 - Chrysanthis, Panos K.
A2 - Guerra, Francesco
T2 - Advances in Database Technology - 24th International Conference on Extending Database Technology, EDBT 2021
Y2 - 23 March 2021 through 26 March 2021
ER -