LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration

Tavor Lipman, Tova Milo, Amit Somech, Tomer Wolfson, Oz Zafar

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Data exploration is a challenging and time-consuming process in which users examine a dataset by iteratively employing a series of queries. While in some cases the user explores a new dataset to become familiar with it, more often, the exploration process is conducted with a speci!c analysis goal or question in mind. To assist users in exploring a new dataset, Automated Data Exploration (ADE) systems have been devised in previous work. These systems aim to auto-generate a full exploration session, containing a sequence of queries that showcase interesting elements of the data. However, existing ADE systems are often constrained by a prede!ned objective function, thus always generating the same session for a given dataset. Therefore, their e"ectiveness in goal-oriented exploration, in which users need to answer speci!c questions about the data, are extremely limited. To this end, this paper presents LINX, a generative system augmented with a natural language interface for goal-oriented ADE. Given an input dataset and an analytical goal described in natural language, LINX generates a personalized exploratory session that is relevant to the user’s goal. LINX utilizes a Large Language Model (LLM) to interpret the input analysis goal, and then derive a set of speci!cations for the desired output exploration session. These speci!cations are then transferred to a novel, modular ADE engine based on Constrained Deep Reinforcement Learning (CDRL), which can adapt its output according to the speci!ed instructions. To validate LINX’s e"ectiveness, we introduce a new benchmark dataset for goal-oriented exploration and conduct an extensive user study. Our analysis underscores LINX’s superior capability in producing exploratory notebooks that are significantly more relevant and bene!cial than those generated by existing solutions, including ChatGPT, goal-agnostic ADE, and commercial systems.

Original languageEnglish
Title of host publicationAdvances in Database Technology - EDBT
Pages270-283
Number of pages14
Edition2
ISBN (Electronic)9783893180981, 9783893180998
DOIs
StatePublished - 11 Nov 2024
Event28th International Conference on Extending Database Technology, EDBT 2025 - Barcelona, Spain
Duration: 25 Mar 202528 Mar 2025

Publication series

NameAdvances in Database Technology - EDBT
Number2
Volume28

Conference

Conference28th International Conference on Extending Database Technology, EDBT 2025
Country/TerritorySpain
CityBarcelona
Period25/03/2528/03/25

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Software
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration'. Together they form a unique fingerprint.

Cite this