Publication
Posterior Sampling Reinforcement Learning with Gaussian Processes for Continuous Control: Sublinear Regret Bounds for Unbounded State Spaces
Hamish Flynn; Joe Watson; Ingmar Posner; Jan Peters
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2603.08287, Pages 1-37, arXiv, 2026.
Abstract
We analyze the Bayesian regret of the Gaussian
process posterior sampling reinforcement learn-
ing (GP-PSRL) algorithm. Posterior sampling is
an effective heuristic for decision-making under
uncertainty that has been used to develop success-
ful algorithms for a variety of continuous con-
trol problems. However, theoretical work on GP-
PSRL is limited. All known regret bounds either
fail to achieve a tight dependence on a kernel-
dependent quantity called the maximum infor-
mation gain or fail to properly account for the
fact that the set of possible system states is un-
bounded. Through a recursive application of the
Borell-Tsirelson-Ibragimov-Sudakov inequality,
we show that, with high probability, the states
actually visited by the algorithm are contained
within a ball of near-constant radius. To ob-
tain tight dependence on the maximum informa-
tion gain, we use the chaining method to con-
trol the regret suffered by GP-PSRL. Our main
result is a Bayesian regret bound of the order
eO(H3/2pγT /H T ), where H is the horizon, T
is the number of time steps and γT /H is the max-
imum information gain. With this result, we re-
solve the limitations with prior theoretical work
on PSRL, and provide the theoretical foundation
and tools for analyzing PSRL in complex settings.
