The Aligned Multimodal Movie Treebank: An audio, video, dependency-parse treebank

Adam Yaari, Jan DeWitt, Henry Hu, Bennett Stankovits, Sue Felshin, Yevgeni Berzak, Helena Aparicio, Boris Katz, Ignacio Cases, Andrei Barbu

Research output: Contribution to conferencePaperpeer-review

Abstract

Treebanks have traditionally included only text and were derived from written sources such as newspapers or the web. We introduce the Aligned Multimodal Movie Treebank (AMMT)†, an English language treebank derived from dialog in Hollywood movies which includes transcriptions of the audiovisual streams with word-level alignment, as well as part of speech tags and dependency parses in the Universal Dependencies (UD) formalism. AMMT consists of 31, 264 sentences and 218, 090 words, that will amount to the 3rd largest UD English treebank and the only multimodal treebank in UD. We find that parsers on this dataset often have difficulty with conversational speech and that they often rely on punctuation which is often not available from speech recognizers. To help with the web-based annotation effort, we also introduce the Efficient Audio Alignment Annotator (EAAA)‡, a companion tool that enables annotators to significantly speed-up their annotation processes.

Original languageEnglish
Pages9531-9539
Number of pages9
StatePublished - 2022
Externally publishedYes
Event2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 - Abu Dhabi, United Arab Emirates
Duration: 7 Dec 202211 Dec 2022

Conference

Conference2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period7/12/2211/12/22

Keywords

  • Universal Dependency parsing
  • audio
  • multimodal
  • treebank
  • video

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'The Aligned Multimodal Movie Treebank: An audio, video, dependency-parse treebank'. Together they form a unique fingerprint.

Cite this