Skip to main content Skip to main navigation

Publikation

SLR: An Automated Synthesis Framework for Scalable Logical Reasoning

Lukas Helff; Ahmad Omar; Felix Friedrich; Wolfgang Stammer; Antonia Wüst; Tim Woydt; Rupert Mitchell; Patrick Schramowski; Kristian Kersting
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2506.15787, Pages 1-20, Computing Research Repository, 2025.

Zusammenfassung

We introduce SLR, an end-to-end framework for systematic evaluation and training of Large Language Models (LLMs) via Scalable Logical Reasoning. Given a user’s task specification, SLR automatically synthesizes (i) an instruc- tion prompt for an inductive reasoning task, (ii) a validation program, executable on model outputs to provide verifiable rewards, and (iii) the latent ground-truth rule. This process is fully automated, scalable, requires no human annotations, and offers precise control over task difficulty. Using SLR, we create SLR-BENCH, a benchmark comprising 19k prompts orga- nized into 20 curriculum levels that progres- sively increase in relational, arithmetic, and re- cursive complexity. Large-scale evaluation re- veals that contemporary LLMs readily produce syntactically valid rules, yet often fail at cor- rect logical inference. Recent reasoning LLMs demonstrate improved performance but incur very high test-time computation, with costs ex- ceeding $300 for just 1,000 prompts. Finally, curriculum learning via SLR doubles Llama-3- 8B accuracy on SLR-BENCH, achieving parity with Gemini-Flash-Thinking at a fraction of computational cost. Moreover, these reason- ing capabilities generalize to a wide range of established benchmarks, underscoring the ef- fectiveness of SLR for downstream reasoning.

Weitere Links