Database Principles and Challenges in Text Analysis

Johannes Doleschal, Benny Kimelfeld, Wim Martens

Research output: Contribution to journalArticlepeer-review

Abstract

A common conceptual view of text analysis is that of a two-step process, where we first extract relations from text documents and then apply a relational query over the result. Hence, text analysis shares technical challenges with, and can draw ideas from, relational databases. A framework that formally instantiates this connection is that of the document spanners. In this article, we review recent advances in various research efforts that adapt fundamental database concepts to text analysis through the lens of document spanners. Among others, we discuss aspects of query evaluation, aggregate queries, provenance, and distributed query planning.

Original languageEnglish
Pages (from-to)6-17
Number of pages12
JournalSIGMOD Record
Volume50
Issue number2
DOIs
StatePublished - Jun 2021
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Database Principles and Challenges in Text Analysis'. Together they form a unique fingerprint.

Cite this