Publikation

Boosting deep Reinforcement Learning using pretraining with Logical Options

Zihan Ye; Phil Chau; Raban Emunds; Jannis Blüml; Cedric Derstroff; Quentin Delfosse; Oleg Arenz; Kristian Kersting

In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2603.06565, Pages 1-23, arXiv, 2026.

Zusammenfassung

Deep reinforcement learning agents are often misaligned, as they over-exploit early re- ward signals. Recently, several symbolic approaches have addressed these challenges by encoding sparse objectives along with aligned plans. However, purely symbolic ar- chitectures are complex to scale and difficult to apply to continuous settings. Hence, we propose a hybrid approach, inspired by humans’ ability to acquire new skills. We use a two-stage framework that injects symbolic structure into neural-based reinforcement learning agents without sacrificing the expressivity of deep policies. Our method, called Hybrid Hierarchical RL (H2RL), introduces a logical option-based pretraining strategy to steer the learning policy away from short-term reward loops and toward goal-directed behavior while allowing the final policy to be refined via standard environment inter- action. Empirically, we show that this approach consistently improves long-horizon decision-making and yields agents that outperform strong neural, symbolic, and neuro- symbolic baselines.

Weitere Links

https://doi.org/10.48550/arXiv.2603.06565

2603.06565v1.pdf (pdf, 662 KB )