Skip to main content Skip to main navigation

Publikation

Towards Executing Sloppy SQL Queries Over Tabular Data Lakes

Jan-Micha Bodensohn; Jakob Steinke; Carsten Binnig
In: 42nd IEEE International Conference on Data Engineering, ICDE 2026 - Workshops, Montreal, QC, Canada, May 4-8, 2026. IEEE International Conference on Data Engineering (ICDE), Pages 151-153, IEEE, 2026.

Zusammenfassung

Data lakes store raw tabular data without enforcing a consistent schema, making data ingestion easy but exploration and querying difficult. Users must often combine multiple tools to find the required tables and prepare them for each query. To address this gap, we tackle a new problem setting where users directly query data lakes by writing so-called sloppy SQL queries, which simply assume an idealized schema for each query. We further present SQLake, a multi-agent framework which executes such sloppy queries by semantically rewriting them for the data lake, thus eliminating any data preparation overhead. SQLake uses a set of Data Lake Actions that enable the agents to interact with the data lake, including specialized retrievers, a Join Path Finder, and a UDF Builder to handle noise. Our initial evaluation shows the promise of this approach, bringing us closer to our goal of making data lakes as easy to query as relational databases.

Weitere Links