Abstract
A major challenge in data stream applications is the change in the target variable over time in unexpected ways, a phenomenon called concept drift (CD). Another challenge is the emergence of novel classes, soliciting novelty detection (ND) by, e.g., one-class or semi-supervised classification. But, in online ND, these two challenges interfere with each other although they should be dealt with jointly. We present the cluster drift detection (CDD) algorithm that, using a single hyper-parameter, performs offline clustering to learn the diverse normal profile, and detects online whether a never-seen-before example is novel or normal using a multivariate statistical test. If it is normal, the CDD uses this example to update the normal-profile cluster, enabling continuous CD monitoring. Experimental results using popular real-world and synthetic data sets, as well as a precision agriculture data set of banana plants under water stress and a COVID-19 data set demonstrate that the CDD algorithm: 1) distinguishes between normal and novel concepts more accurately than state-of-the-art algorithms, 2) provides information about why specific novel concepts are misdetected, and 3) is more robust to the complexity, drift, and noise in the problem than other algorithms.
| Original language | American English |
|---|---|
| Title of host publication | Proceedings - 19th IEEE International Conference on Machine Learning and Applications, ICMLA 2020 |
| Editors | M. Arif Wani, Feng Luo, Xiaolin Li, Dejing Dou, Francesco Bonchi |
| Pages | 171-178 |
| Number of pages | 8 |
| ISBN (Electronic) | 9781728184708 |
| DOIs | |
| State | Published - 1 Dec 2020 |
| Event | 19th IEEE International Conference on Machine Learning and Applications, ICMLA 2020 - Virtual, Miami, United States Duration: 14 Dec 2020 → 17 Dec 2020 |
Conference
| Conference | 19th IEEE International Conference on Machine Learning and Applications, ICMLA 2020 |
|---|---|
| Country/Territory | United States |
| City | Virtual, Miami |
| Period | 14/12/20 → 17/12/20 |
Keywords
- Concept drift
- Novelty detection
- Streaming data
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Computer Vision and Pattern Recognition
- Hardware and Architecture
- Computer Science Applications