Publikation

Towards More Realistic Natural Language Queries in Text-to-SQL Benchmarks

Charlotte Dörschner; Frank Jäkel; Carsten Binnig

In: Proceedings of the Workshop on Human-In-the-Loop Data Analytics, HILDA 2026, Bengaluru, India, 31 May 2026 - 5 June 2026. Workshop on Human-In-the-Loop Data Analytics (HILDA), Pages 1-8, ACM, 2026.

Zusammenfassung

Advances in large language models have significantly improved performance on Text-to-SQL benchmarks. However, existing benchmarks collect natural language queries (NLQs) through methods such as paraphrasing SQL queries, resulting in queries that remain strongly SQL-like. While there have been recent trends towards making Text-to-SQL benchmarks more realistic, it is still unclear how users, who may not be proficient in SQL, naturally try to query data. Therefore, in this paper, we present the results of a study with 80 participants, who formulated queries to answer broad questions with the help of a database. The study produced a corpus of 887 NLQs, which we analyzed to identify systematic differences between benchmark-style queries and those posed by our participants. Based on our findings, we discuss implications for the design of more realistic Text-to-SQL benchmarks.

Weitere Links

https://doi.org/10.1145/3814573.3814945

3814573.3814945.pdf (pdf, 567 KB )