Skip to main content Skip to main navigation

Publikation

The Stretto Execution Engine for LLM-Augmented Data Systems

Gabriele Sanmartino; Matthias Urban; Paolo Papotti; Carsten Binnig
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2602.04430, Pages 1-13, arXiv, 2026.

Zusammenfassung

LLM-augmented data systems enable semantic querying over struc- tured and unstructured data, but executing queries with LLM- powered operators introduces a fundamental runtime–accuracy trade-off. In this paper, we present Stretto, a new execution en- gine that provides end-to-end query guarantees while efficiently navigating this trade-off in a holistic manner. For this, Stretto formulates query planning as a constrained optimization problem and uses a gradient-based optimizer to jointly select operator imple- mentations and allocate error budgets across pipelines. Moreover, to enable fine-grained execution choices, Stretto introduces a novel idea on how KV-caching can be used to realize a spectrum of different physical operators that transform a sparse design space into a dense continuum of runtime–accuracy trade-offs. Experi- ments show that Stretto outperforms state-of-the-art systems while consistently meeting quality guarantees.

Weitere Links