Publication
Towards Executing Sloppy SQL Queries Over Tabular Data Lakes
Jan-Micha Bodensohn; Jakob Steinke; Carsten Binnig
In: 42nd IEEE International Conference on Data Engineering, ICDE 2026 - Workshops, Montreal, QC, Canada, May 4-8, 2026. IEEE International Conference on Data Engineering (ICDE), Pages 151-153, IEEE, 2026.
Abstract
Data lakes store raw tabular data without enforcing
a consistent schema, making data ingestion easy but exploration
and querying difficult. Users must often combine multiple tools
to find the required tables and prepare them for each query. To
address this gap, we tackle a new problem setting where users
directly query data lakes by writing so-called sloppy SQL queries,
which simply assume an idealized schema for each query. We
further present SQLake, a multi-agent framework which executes
such sloppy queries by semantically rewriting them for the data
lake, thus eliminating any data preparation overhead. SQLake
uses a set of Data Lake Actions that enable the agents to interact
with the data lake, including specialized retrievers, a Join Path
Finder, and a UDF Builder to handle noise. Our initial evaluation
shows the promise of this approach, bringing us closer to our goal
of making data lakes as easy to query as relational databases.
