TY - JOUR
T1 - Using Machine Learning to Detect 'Multiple-Account' Cheating and Analyze the Influence of Student and Problem Features
AU - Ruiperez-Valiente, Jose A.
AU - Munoz-Merino, Pedro J.
AU - Alexandron, Giora
AU - Pritchard, David E.
N1 - Funding Agency: Madrid Regional Government; Spanish Ministry of Economy and Competitiveness; European Erasmus+ projects MOOC Maker); SHEILA;
PY - 2019/3
Y1 - 2019/3
N2 - One of the reported methods of cheating in online environments in the literature is CAMEO (Copying Answers using Multiple Existences Online), where harvesting accounts are used to obtain correct answers that are later submitted in the master account which gives the student credit to obtain a certificate. In previous research, we developed an algorithm to identify and label submissions that were cheated using the CAMEO method; this algorithm relied on the IP of the submissions. In this study, we use this tagged sample of submissions to i) compare the influence of student and problems characteristics on CAMEO and ii) build a random forest classifier that detects submissions as CAMEO without relying on IP, achieving sensitivity and specificity levels of 0.966 and 0.996, respectively. Finally, we analyze the importance of the different features of the model finding that student features are the most important variables towards the correct classification of CAMEO submissions, concluding also that student features have more influence on CAMEO than problem features.
AB - One of the reported methods of cheating in online environments in the literature is CAMEO (Copying Answers using Multiple Existences Online), where harvesting accounts are used to obtain correct answers that are later submitted in the master account which gives the student credit to obtain a certificate. In previous research, we developed an algorithm to identify and label submissions that were cheated using the CAMEO method; this algorithm relied on the IP of the submissions. In this study, we use this tagged sample of submissions to i) compare the influence of student and problems characteristics on CAMEO and ii) build a random forest classifier that detects submissions as CAMEO without relying on IP, achieving sensitivity and specificity levels of 0.966 and 0.996, respectively. Finally, we analyze the importance of the different features of the model finding that student features are the most important variables towards the correct classification of CAMEO submissions, concluding also that student features have more influence on CAMEO than problem features.
U2 - 10.1109/TLT.2017.2784420
DO - 10.1109/TLT.2017.2784420
M3 - مقالة
SN - 1939-1382
VL - 12
SP - 112
EP - 122
JO - IEEE Transactions on Learning Technologies
JF - IEEE Transactions on Learning Technologies
IS - 1
ER -