TY - GEN
T1 - Stable Tuple Embeddings for Dynamic Databases
AU - Toenshoff, Jan
AU - Friedman, Neta
AU - Grohe, Martin
AU - Kimelfeld, Benny
N1 - Publisher Copyright: © 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - We study the problem of computing an embedding of the tuples of a relational database in a manner that is extensible to dynamic changes of the database. In this problem, the embedding should be stable in the sense that it should not change on the existing tuples due to the embedding of newly inserted tuples (as database applications might already rely on existing embeddings); at the same time, the embedding of all tuples, old and new, should retain high quality. This task is challenging since inter-dependencies among the embeddings of different entities are inherent in state-of-the-art embedding techniques for structured data.We study two approaches to solving the problem. The first is an adaptation of Node2Vec to dynamic databases. The second is the FoRWaRD algorithm (Foreign Key Random Walk Embeddings for Relational Databases) that draws from embedding techniques for general graphs and knowledge graphs, and is inherently utilizing the schema and its key and foreign-key constraints. We evaluate the embedding algorithms using a collection of downstream tasks of column prediction over geographical and biological domains. We find that in the traditional static setting, our two embedding methods achieve comparable results that are compatible with the state-of-the-art for the specific applications. In the dynamic setting, we find that the FoRWaRD algorithm generally outperforms and runs faster than the alternatives, and moreover, it features only a mild reduction of quality even when the database consists of more than half newly inserted tuples after the initial training of the embedding.
AB - We study the problem of computing an embedding of the tuples of a relational database in a manner that is extensible to dynamic changes of the database. In this problem, the embedding should be stable in the sense that it should not change on the existing tuples due to the embedding of newly inserted tuples (as database applications might already rely on existing embeddings); at the same time, the embedding of all tuples, old and new, should retain high quality. This task is challenging since inter-dependencies among the embeddings of different entities are inherent in state-of-the-art embedding techniques for structured data.We study two approaches to solving the problem. The first is an adaptation of Node2Vec to dynamic databases. The second is the FoRWaRD algorithm (Foreign Key Random Walk Embeddings for Relational Databases) that draws from embedding techniques for general graphs and knowledge graphs, and is inherently utilizing the schema and its key and foreign-key constraints. We evaluate the embedding algorithms using a collection of downstream tasks of column prediction over geographical and biological domains. We find that in the traditional static setting, our two embedding methods achieve comparable results that are compatible with the state-of-the-art for the specific applications. In the dynamic setting, we find that the FoRWaRD algorithm generally outperforms and runs faster than the alternatives, and moreover, it features only a mild reduction of quality even when the database consists of more than half newly inserted tuples after the initial training of the embedding.
KW - Database Embedding
KW - Node2Vec
UR - http://www.scopus.com/inward/record.url?scp=85167737082&partnerID=8YFLogxK
U2 - 10.1109/ICDE55515.2023.00103
DO - 10.1109/ICDE55515.2023.00103
M3 - منشور من مؤتمر
T3 - Proceedings - International Conference on Data Engineering
SP - 1286
EP - 1299
BT - Proceedings - 2023 IEEE 39th International Conference on Data Engineering, ICDE 2023
T2 - 39th IEEE International Conference on Data Engineering, ICDE 2023
Y2 - 3 April 2023 through 7 April 2023
ER -