Publikation
Towards More Realistic Natural Language Queries in Text-to-SQL Benchmarks
Charlotte Dörschner; Frank Jäkel; Carsten Binnig
In: Proceedings of the Workshop on Human-In-the-Loop Data Analytics, HILDA 2026, Bengaluru, India, 31 May 2026 - 5 June 2026. Workshop on Human-In-the-Loop Data Analytics (HILDA), Pages 1-8, ACM, 2026.
Zusammenfassung
Advances in large language models have significantly improved
performance on Text-to-SQL benchmarks. However, existing benchmarks
collect natural language queries (NLQs) through methods
such as paraphrasing SQL queries, resulting in queries that remain
strongly SQL-like. While there have been recent trends towards
making Text-to-SQL benchmarks more realistic, it is still unclear
how users, who may not be proficient in SQL, naturally try to query
data. Therefore, in this paper, we present the results of a study with
80 participants, who formulated queries to answer broad questions
with the help of a database. The study produced a corpus of
887 NLQs, which we analyzed to identify systematic differences
between benchmark-style queries and those posed by our participants.
Based on our findings, we discuss implications for the design
of more realistic Text-to-SQL benchmarks.
