Abstract
One of the main obstacles in applying machine learning to a new domain is the limited availability of labeled data. A common approach for overcoming this challenge is using semi-supervised learning, where labeled and unlabeled data are used together to label additional samples. One of the most common automatic labeling approaches is co-training, which trains two learners on different views of the data, and then proceeds to collaboratively and iteratively label additional samples. Despite their effectiveness in multiple domains, existing co-training approaches for tabular data are either heuristic, and therefore error-prone, or use a greedy approach that leads to sub-optimal performance. We present ReCom, a deep reinforcement learning-based co-training approach. Our approach models multiple aspects of both the dataset and the two learners and develops advanced labeling strategies that achieve state-of-the-art performance. ReCom overcomes the challenge of limited data availability by simultaneously training on multiple datasets, thus producing a generic and robust labeling policy that can be applied to new datasets without the need for any additional training. Our experiments, conducted on a diverse group of 32 datasets, demonstrate the merits of our approach.
Original language | American English |
---|---|
Pages (from-to) | 321-340 |
Number of pages | 20 |
Journal | Information Sciences |
Volume | 589 |
DOIs | |
State | Published - 1 Apr 2022 |
Keywords
- Co-training
- Reinforcement Learning
- Semi-supervised Learning
All Science Journal Classification (ASJC) codes
- Software
- Control and Systems Engineering
- Theoretical Computer Science
- Computer Science Applications
- Information Systems and Management
- Artificial Intelligence