Publikation
Boosting deep Reinforcement Learning using pretraining with Logical Options
Zihan Ye; Phil Chau; Raban Emunds; Jannis Blüml; Cedric Derstroff; Quentin Delfosse; Oleg Arenz; Kristian Kersting
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2603.06565, Pages 1-23, arXiv, 2026.
Zusammenfassung
Deep reinforcement learning agents are often misaligned, as they over-exploit early re-
ward signals. Recently, several symbolic approaches have addressed these challenges
by encoding sparse objectives along with aligned plans. However, purely symbolic ar-
chitectures are complex to scale and difficult to apply to continuous settings. Hence, we
propose a hybrid approach, inspired by humans’ ability to acquire new skills. We use
a two-stage framework that injects symbolic structure into neural-based reinforcement
learning agents without sacrificing the expressivity of deep policies. Our method, called
Hybrid Hierarchical RL (H2RL), introduces a logical option-based pretraining strategy
to steer the learning policy away from short-term reward loops and toward goal-directed
behavior while allowing the final policy to be refined via standard environment inter-
action. Empirically, we show that this approach consistently improves long-horizon
decision-making and yields agents that outperform strong neural, symbolic, and neuro-
symbolic baselines.
