Abstract
Treebanks have traditionally included only text and were derived from written sources such as newspapers or the web. We introduce the Aligned Multimodal Movie Treebank (AMMT)†, an English language treebank derived from dialog in Hollywood movies which includes transcriptions of the audiovisual streams with word-level alignment, as well as part of speech tags and dependency parses in the Universal Dependencies (UD) formalism. AMMT consists of 31, 264 sentences and 218, 090 words, that will amount to the 3rd largest UD English treebank and the only multimodal treebank in UD. We find that parsers on this dataset often have difficulty with conversational speech and that they often rely on punctuation which is often not available from speech recognizers. To help with the web-based annotation effort, we also introduce the Efficient Audio Alignment Annotator (EAAA)‡, a companion tool that enables annotators to significantly speed-up their annotation processes.
Original language | English |
---|---|
Pages | 9531-9539 |
Number of pages | 9 |
State | Published - 2022 |
Externally published | Yes |
Event | 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 - Abu Dhabi, United Arab Emirates Duration: 7 Dec 2022 → 11 Dec 2022 |
Conference
Conference | 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 |
---|---|
Country/Territory | United Arab Emirates |
City | Abu Dhabi |
Period | 7/12/22 → 11/12/22 |
Keywords
- Universal Dependency parsing
- audio
- multimodal
- treebank
- video
All Science Journal Classification (ASJC) codes
- Computational Theory and Mathematics
- Computer Science Applications
- Information Systems