Transducing Markov sequences

Benny Kimelfeld, Christopher Ré

Research output: Contribution to journalArticlepeer-review

Abstract

A Markov sequence is a basic statistical model representing uncertain sequential data, and it is used within a plethora of applications, including speech recognition, image processing, computational biology, radiofrequency identification (RFID), and information extraction. The problem of querying a Markov sequence is studied under the conventional semantics of querying a probabilistic database, where queries are formulated as finite-state transducers. Specifically, the complexity of two main problems is analyzed. The first problem is that of computing the confidence (probability) of an answer. The second is the enumeration of the answers in the order of decreasing confidence (with the generation of the top-κ answers as a special case), or in an approximate order thereof. In particular, it is shown that enumeration in any subexponentialapproximate order is generally intractable (even for some fixed transducers), and a matching upper bound is obtained through a proposed heuristic. Due to this hardness, a special consideration is given to restricted (yet common) classes of transducers that extract matches of a regular expression (subject to prefix and suffix constraints), and it is shown that these classes are, indeed, significantly more tractable.

Original languageEnglish
Article number32
JournalJournal of the ACM
Volume61
Issue number5
DOIs
StatePublished - 8 Sep 2014
Externally publishedYes

Keywords

  • Enumeration
  • Hidden Markov models
  • Markov sequences
  • Probabilistic databases
  • Ranked query evaluation
  • Transducers

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Information Systems
  • Hardware and Architecture
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Transducing Markov sequences'. Together they form a unique fingerprint.

Cite this