Publikation
SLR: An Automated Synthesis Framework for Scalable Logical Reasoning
Lukas Helff; Ahmad Omar; Felix Friedrich; Wolfgang Stammer; Antonia Wüst; Tim Woydt; Rupert Mitchell; Patrick Schramowski; Kristian Kersting
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2506.15787, Pages 1-20, Computing Research Repository, 2025.
Zusammenfassung
We introduce SLR, an end-to-end framework
for systematic evaluation and training of Large
Language Models (LLMs) via Scalable Logical
Reasoning. Given a user’s task specification,
SLR automatically synthesizes (i) an instruc-
tion prompt for an inductive reasoning task,
(ii) a validation program, executable on model
outputs to provide verifiable rewards, and (iii)
the latent ground-truth rule. This process is
fully automated, scalable, requires no human
annotations, and offers precise control over task
difficulty. Using SLR, we create SLR-BENCH,
a benchmark comprising 19k prompts orga-
nized into 20 curriculum levels that progres-
sively increase in relational, arithmetic, and re-
cursive complexity. Large-scale evaluation re-
veals that contemporary LLMs readily produce
syntactically valid rules, yet often fail at cor-
rect logical inference. Recent reasoning LLMs
demonstrate improved performance but incur
very high test-time computation, with costs ex-
ceeding $300 for just 1,000 prompts. Finally,
curriculum learning via SLR doubles Llama-3-
8B accuracy on SLR-BENCH, achieving parity
with Gemini-Flash-Thinking at a fraction of
computational cost. Moreover, these reason-
ing capabilities generalize to a wide range of
established benchmarks, underscoring the ef-
fectiveness of SLR for downstream reasoning.
