Abstract
When naturally occurring data is characterized by a highly skewed class distribution, supervised learning often benefits from reducing this skew. Human-agent dialogue data is commonly highly skewed when using standard agent policies. Hence, we suggest that agent policies need to be reconsidered in the context of training data collection. Specifically, in this work we implemented biased agent policies that are optimized for data collection in the negotiation domain. Empirical evaluations show that our method is successful in collecting a reasonably balanced corpus in the highly skewed Job-Candidate domain. Furthermore, using this balanced corpus to train a negotiation intent classifier yields notable performance improvements relative to naturally distributed data.
Original language | American English |
---|---|
Title of host publication | Proceedings of WOCHAT, the Second Workshop on Chatbots and Conversational Agent Technologies |
Place of Publication | Los Angeles |
Number of pages | 12 |
State | Published - 1 Sep 2016 |
Keywords
- Virtual Humans