TY - JOUR
T1 - Crowd-sourced annotation of ECG signals using contextual information
AU - Zhu, Tingting
AU - Johnson, Alistair E.W.
AU - Behar, Joachim
AU - Clifford, Gari D.
N1 - Funding Information: TZ and AJ acknowledge the support of the RCUK Digital Economy Programme grant number EP/ G036861/1 (Oxford Centre for Doctoral Training in Healthcare Innovation). TZ also acknowledges the support of China Mobile Research Institute. JB is supported by the UK EPSRC, the Balliol French Anderson Scholarship Fund, and MindChild Medical Inc. (North Andover, MA).
PY - 2014/4
Y1 - 2014/4
N2 - For medical applications, the ground truth is ascertained through manual labels by clinical experts. However, significant inter-observer variability and various human biases limit accuracy. A probabilistic framework addresses these issues by comparing aggregated human and automated labels to provide a reliable ground truth, with no prior knowledge of the individual performance. As an alternative to median or mean voting strategies, novel contextual features (signal quality and physiology) were introduced to allow the Probabilistic Label Aggregator (PLA) to weight an algorithm or human based on its performance. As a proof of concept, the PLA was applied to QT interval (pro-arrhythmic indicator) estimation from the electrocardiogram using labels from 20 humans and 48 algorithms crowd-sourced from the 2006 PhysioNet/Computing in Cardiology Challenge database. For automatic annotations, the root mean square error of the PLA was 13.97 ± 0.46 ms, significantly outperforming the best Challenge entry (16.36 ms) as well as mean and median voting strategies (17.67 ± 0.56 ms and 14.44 ± 0.52 ms respectively with p<0.05). When selecting three annotators, the PLA improved the annotation accuracy over median aggregation by 10.7% for human annotators and 14.4% for automated algorithms. The PLA could therefore provide an improved ''gold standard'' for medical annotation tasks even when ground truth is not available.
AB - For medical applications, the ground truth is ascertained through manual labels by clinical experts. However, significant inter-observer variability and various human biases limit accuracy. A probabilistic framework addresses these issues by comparing aggregated human and automated labels to provide a reliable ground truth, with no prior knowledge of the individual performance. As an alternative to median or mean voting strategies, novel contextual features (signal quality and physiology) were introduced to allow the Probabilistic Label Aggregator (PLA) to weight an algorithm or human based on its performance. As a proof of concept, the PLA was applied to QT interval (pro-arrhythmic indicator) estimation from the electrocardiogram using labels from 20 humans and 48 algorithms crowd-sourced from the 2006 PhysioNet/Computing in Cardiology Challenge database. For automatic annotations, the root mean square error of the PLA was 13.97 ± 0.46 ms, significantly outperforming the best Challenge entry (16.36 ms) as well as mean and median voting strategies (17.67 ± 0.56 ms and 14.44 ± 0.52 ms respectively with p<0.05). When selecting three annotators, the PLA improved the annotation accuracy over median aggregation by 10.7% for human annotators and 14.4% for automated algorithms. The PLA could therefore provide an improved ''gold standard'' for medical annotation tasks even when ground truth is not available.
KW - Crowd-sourcing
KW - ECG
KW - Probabilistic analysis
KW - QT estimation
KW - Signal quality
KW - Unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=84898596898&partnerID=8YFLogxK
U2 - https://doi.org/10.1007/s10439-013-0964-6
DO - https://doi.org/10.1007/s10439-013-0964-6
M3 - مقالة
SN - 0090-6964
VL - 42
SP - 871
EP - 884
JO - Annals of Biomedical Engineering
JF - Annals of Biomedical Engineering
IS - 4
ER -