Publication
Learning from Less: Guiding Deep Reinforcement Learning with Differentiable Symbolic Planning
Zihan Ye; Oleg Arenz; Kristian Kersting
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2505.11661, Pages 1-23, Computing Research Repository, 2025.
Abstract
When tackling complex problems, humans naturally break them down into smaller,
manageable subtasks and adjust their initial plans based on observations. For
instance, if you want to make coffee at a friend’s place, you might initially plan
to grab coffee beans, go to the coffee machine, and pour them into the machine.
Upon noticing that the machine is full, you would skip the initial steps and proceed
directly to brewing. In stark contrast, state-of-the-art reinforcement learners, such
as Proximal Policy Optimization (PPO), lack such prior knowledge and therefore
require significantly more training steps to exhibit comparable adaptive behavior.
Thus, a central research question arises: How can we enable reinforcement learning
(RL) agents to have similar “human priors”, allowing the agent to learn with
fewer training interactions? To address this challenge, we propose differentiable
symbolic planner (Dylan), a novel framework that integrates symbolic planning
into Reinforcement Learning. Dylan serves as a reward model that dynamically
shapes rewards by leveraging human priors, guiding agents through intermediate
subtasks, thus enabling more efficient exploration. Beyond reward shaping, Dylan
can work as a high-level planner that composes primitive policies to generate
new behaviors while avoiding common symbolic planner pitfalls such as infinite
execution loops. Our experimental evaluations demonstrate that Dylan significantly
improves RL agents’ performance and facilitates generalization to unseen tasks.
