Publikation
DoublyAware: Dual Planning and Policy Awareness for Temporal Difference Learning in Humanoid Locomotion
Khang Nguyen; An T. Le; Jan Peters; Minh Nhat Vu
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2506.12095, Pages 1-8, arXiv, 2025.
Zusammenfassung
Achieving robust robot learning for humanoid lo-
comotion is a fundamental challenge in model-based reinforce-
ment learning (MBRL), where environmental stochasticity and
randomness can hinder efficient exploration and learning sta-
bility. The environmental, so-called aleatoric, uncertainty can
be amplified in high-dimensional action spaces with complex
contact dynamics, and further entangled with epistemic un-
certainty in the models during learning phases. In this work,
we propose DoublyAware, an uncertainty-aware extension of
Temporal Difference Model Predictive Control (TD-MPC) that
explicitly decomposes uncertainty into two disjoint interpretable
components, i.e., planning and policy uncertainties. To handle the
planning uncertainty, DoublyAware employs conformal prediction
to filter candidate trajectories using quantile-calibrated risk
bounds, ensuring statistical consistency and robustness against
stochastic dynamics. Meanwhile, policy rollouts are leveraged
as structured informative priors to support the learning phase
with Group-Relative Policy Constraint (GRPC) optimizers that
impose a group-based adaptive trust-region in the latent action
space. This principled combination enables the robot agent to
prioritize high-confidence, high-reward behavior while maintain-
ing effective, targeted exploration under uncertainty. Evaluated
on the HumanoidBench locomotion suite with the Unitree
26-DoF H1-2 humanoid, DoublyAware demonstrates improved
sample efficiency, accelerated convergence, and enhanced motion
feasibility compared to RL baselines. Our simulation results
emphasize the significance of structured uncertainty modeling
for data-efficient and reliable decision-making in TD-MPC-based
humanoid locomotion learning.
