QMDIS: QCRI-MIT advanced dialect identification system

Sameer Khurana, Maryam Najafian, Ahmed Ali, Tuka Al Hanai, Yonatan Belinkov, James Glass

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

As a continuation of our efforts towards tackling the problem of spoken Dialect Identification (DID) for Arabic languages, we present the QCRI-MIT Advanced Dialect Identification System (QMDIS). QMDIS is an automatic spoken DID system for Dialectal Arabic (DA). In this paper, we report a comprehensive study of the three main components used in the spoken DID task: phonotactic, lexical and acoustic. We use Support Vector Machines (SVMs), Logistic Regression (LR) and Convolutional Neural Networks (CNNs) as backend classifiers throughout the study. We perform all our experiments on a publicly available dataset and present new state-of-The-Art results. QMDIS discriminates between the five most widely used dialects of Arabic: namely Egyptian, Gulf, Levantine, North African, and Modern Standard Arabic (MSA).We report ∼ 73% accuracy for system combination. All the data and the code used in our experiments are publicly available for research.

Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Pages2591-2595
Number of pages5
Volume2017-August
DOIs
StatePublished - 2017
Externally publishedYes
Event18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden
Duration: 20 Aug 201724 Aug 2017

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Conference

Conference18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017
Country/TerritorySweden
CityStockholm
Period20/08/1724/08/17

Keywords

  • Acoustic
  • Arabic
  • Convolutional Neural Network
  • Lexical
  • Logistic Regression
  • Phonotactic
  • Spoken Dialect Identification
  • Support Vector Machine

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'QMDIS: QCRI-MIT advanced dialect identification system'. Together they form a unique fingerprint.

Cite this