TY - GEN
T1 - Incorporating information extraction in the relational database model
AU - Nahshon, Yoav
AU - Peterfreund, Liat
AU - Vansummeren, Stijn
N1 - Publisher Copyright: © 2016 ACM.
PY - 2016/6/26
Y1 - 2016/6/26
N2 - Modern information extraction pipelines are typically constructed by (1) loading textual data from a database into a special-purpose application, (2) applying a myriad of text-analytics functions to the text, which produce a structured relational table, and (3) storing this table in a database. Obviously, this approach can lead to laborious development processes, complex and tangled programs, and inefficient control flows. Towards solving these deficiencies, we embark on an effort to lay the foundations of a new generation of text-centric database management systems. Concretely, we extend the relational model by incorporating into it the theory of document spanners which provides the means and methods for the model to engage the Information Extraction (IE) tasks. This extended model, called Spannerlog, provides a novel declarative method for defining and manipulating textual data, which makes possible the automation of the typical work method described above. In addition to formally defining Spannerlog and illustrating its usefulness for IE tasks, we also report on initial results concerning its expressive power.
AB - Modern information extraction pipelines are typically constructed by (1) loading textual data from a database into a special-purpose application, (2) applying a myriad of text-analytics functions to the text, which produce a structured relational table, and (3) storing this table in a database. Obviously, this approach can lead to laborious development processes, complex and tangled programs, and inefficient control flows. Towards solving these deficiencies, we embark on an effort to lay the foundations of a new generation of text-centric database management systems. Concretely, we extend the relational model by incorporating into it the theory of document spanners which provides the means and methods for the model to engage the Information Extraction (IE) tasks. This extended model, called Spannerlog, provides a novel declarative method for defining and manipulating textual data, which makes possible the automation of the typical work method described above. In addition to formally defining Spannerlog and illustrating its usefulness for IE tasks, we also report on initial results concerning its expressive power.
KW - Datalog
KW - Information extraction
KW - Relational model
KW - Spanners
UR - http://www.scopus.com/inward/record.url?scp=84979775186&partnerID=8YFLogxK
U2 - 10.1145/2932194.2932200
DO - 10.1145/2932194.2932200
M3 - منشور من مؤتمر
T3 - Proceedings of the 19th International Workshop on Web and Databases, WebDB 2016
BT - Proceedings of the 19th International Workshop on Web and Databases, WebDB 2016
T2 - 19th International Workshop on Web and Databases, WebDB 2016
Y2 - 26 June 2016 through 1 July 2016
ER -