TY - GEN
T1 - Selecting Walk Schemes for Database Embedding
AU - Lubarsky, Yuval Lev
AU - Tönshoff, Jan
AU - Grohe, Martin
AU - Kimelfeld, Benny
N1 - Publisher Copyright: © 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 979-8-4007-0124-5/23/10...$15.00.
PY - 2023/10/21
Y1 - 2023/10/21
N2 - Machinery for data analysis often requires a numeric representation of the input. Towards that, a common practice is to embed components of structured data into a high-dimensional vector space. We study the embedding of the tuples of a relational database, where existing techniques are often based on optimization tasks over a collection of random walks from the database. The focus of this paper is on the recent FoRWaRD algorithm that is designed for dynamic databases, where walks are sampled by following foreign keys between tuples. Importantly, different walks have different schemas, or “walk schemes,” that are derived by listing the relations and attributes along the walk. Also importantly, different walk schemes describe relationships of different natures in the database. We show that by focusing on a few informative walk schemes, we can obtain tuple embedding significantly faster, while retaining the quality. We define the problem of scheme selection for tuple embedding, devise several approaches and strategies for scheme selection, and conduct a thorough empirical study of the performance over a collection of downstream tasks. Our results confirm that with effective strategies for scheme selection, we can obtain high-quality embeddings considerably (e.g., three times) faster, preserve the extensibility to newly inserted tuples, and even achieve an increase in the precision of some tasks.
AB - Machinery for data analysis often requires a numeric representation of the input. Towards that, a common practice is to embed components of structured data into a high-dimensional vector space. We study the embedding of the tuples of a relational database, where existing techniques are often based on optimization tasks over a collection of random walks from the database. The focus of this paper is on the recent FoRWaRD algorithm that is designed for dynamic databases, where walks are sampled by following foreign keys between tuples. Importantly, different walks have different schemas, or “walk schemes,” that are derived by listing the relations and attributes along the walk. Also importantly, different walk schemes describe relationships of different natures in the database. We show that by focusing on a few informative walk schemes, we can obtain tuple embedding significantly faster, while retaining the quality. We define the problem of scheme selection for tuple embedding, devise several approaches and strategies for scheme selection, and conduct a thorough empirical study of the performance over a collection of downstream tasks. Our results confirm that with effective strategies for scheme selection, we can obtain high-quality embeddings considerably (e.g., three times) faster, preserve the extensibility to newly inserted tuples, and even achieve an increase in the precision of some tasks.
KW - Database embedding
KW - random walks
KW - walk schemes
UR - http://www.scopus.com/inward/record.url?scp=85178159448&partnerID=8YFLogxK
U2 - https://doi.org/10.1145/3583780.3615052
DO - https://doi.org/10.1145/3583780.3615052
M3 - منشور من مؤتمر
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1677
EP - 1686
BT - CIKM 2023 - Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
T2 - 32nd ACM International Conference on Information and Knowledge Management, CIKM 2023
Y2 - 21 October 2023 through 25 October 2023
ER -