TY - GEN
T1 - LINX
T2 - 28th International Conference on Extending Database Technology, EDBT 2025
AU - Lipman, Tavor
AU - Milo, Tova
AU - Somech, Amit
AU - Wolfson, Tomer
AU - Zafar, Oz
N1 - Publisher Copyright: © 2025 Copyright held by the owner/author(s).
PY - 2024/11/11
Y1 - 2024/11/11
N2 - Data exploration is a challenging and time-consuming process in which users examine a dataset by iteratively employing a series of queries. While in some cases the user explores a new dataset to become familiar with it, more often, the exploration process is conducted with a speci!c analysis goal or question in mind. To assist users in exploring a new dataset, Automated Data Exploration (ADE) systems have been devised in previous work. These systems aim to auto-generate a full exploration session, containing a sequence of queries that showcase interesting elements of the data. However, existing ADE systems are often constrained by a prede!ned objective function, thus always generating the same session for a given dataset. Therefore, their e"ectiveness in goal-oriented exploration, in which users need to answer speci!c questions about the data, are extremely limited. To this end, this paper presents LINX, a generative system augmented with a natural language interface for goal-oriented ADE. Given an input dataset and an analytical goal described in natural language, LINX generates a personalized exploratory session that is relevant to the user’s goal. LINX utilizes a Large Language Model (LLM) to interpret the input analysis goal, and then derive a set of speci!cations for the desired output exploration session. These speci!cations are then transferred to a novel, modular ADE engine based on Constrained Deep Reinforcement Learning (CDRL), which can adapt its output according to the speci!ed instructions. To validate LINX’s e"ectiveness, we introduce a new benchmark dataset for goal-oriented exploration and conduct an extensive user study. Our analysis underscores LINX’s superior capability in producing exploratory notebooks that are significantly more relevant and bene!cial than those generated by existing solutions, including ChatGPT, goal-agnostic ADE, and commercial systems.
AB - Data exploration is a challenging and time-consuming process in which users examine a dataset by iteratively employing a series of queries. While in some cases the user explores a new dataset to become familiar with it, more often, the exploration process is conducted with a speci!c analysis goal or question in mind. To assist users in exploring a new dataset, Automated Data Exploration (ADE) systems have been devised in previous work. These systems aim to auto-generate a full exploration session, containing a sequence of queries that showcase interesting elements of the data. However, existing ADE systems are often constrained by a prede!ned objective function, thus always generating the same session for a given dataset. Therefore, their e"ectiveness in goal-oriented exploration, in which users need to answer speci!c questions about the data, are extremely limited. To this end, this paper presents LINX, a generative system augmented with a natural language interface for goal-oriented ADE. Given an input dataset and an analytical goal described in natural language, LINX generates a personalized exploratory session that is relevant to the user’s goal. LINX utilizes a Large Language Model (LLM) to interpret the input analysis goal, and then derive a set of speci!cations for the desired output exploration session. These speci!cations are then transferred to a novel, modular ADE engine based on Constrained Deep Reinforcement Learning (CDRL), which can adapt its output according to the speci!ed instructions. To validate LINX’s e"ectiveness, we introduce a new benchmark dataset for goal-oriented exploration and conduct an extensive user study. Our analysis underscores LINX’s superior capability in producing exploratory notebooks that are significantly more relevant and bene!cial than those generated by existing solutions, including ChatGPT, goal-agnostic ADE, and commercial systems.
UR - http://www.scopus.com/inward/record.url?scp=105007867305&partnerID=8YFLogxK
U2 - 10.48786/edbt.2025.22
DO - 10.48786/edbt.2025.22
M3 - منشور من مؤتمر
T3 - Advances in Database Technology - EDBT
SP - 270
EP - 283
BT - Advances in Database Technology - EDBT
Y2 - 25 March 2025 through 28 March 2025
ER -