Publikation

Posterior Sampling Reinforcement Learning with Gaussian Processes for Continuous Control: Sublinear Regret Bounds for Unbounded State Spaces

Hamish Flynn; Joe Watson; Ingmar Posner; Jan Peters

In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2603.08287, Pages 1-37, arXiv, 2026.

Zusammenfassung

We analyze the Bayesian regret of the Gaussian process posterior sampling reinforcement learn- ing (GP-PSRL) algorithm. Posterior sampling is an effective heuristic for decision-making under uncertainty that has been used to develop success- ful algorithms for a variety of continuous con- trol problems. However, theoretical work on GP- PSRL is limited. All known regret bounds either fail to achieve a tight dependence on a kernel- dependent quantity called the maximum infor- mation gain or fail to properly account for the fact that the set of possible system states is un- bounded. Through a recursive application of the Borell-Tsirelson-Ibragimov-Sudakov inequality, we show that, with high probability, the states actually visited by the algorithm are contained within a ball of near-constant radius. To ob- tain tight dependence on the maximum informa- tion gain, we use the chaining method to con- trol the regret suffered by GP-PSRL. Our main result is a Bayesian regret bound of the order eO(H3/2pγT /H T ), where H is the horizon, T is the number of time steps and γT /H is the max- imum information gain. With this result, we re- solve the limitations with prior theoretical work on PSRL, and provide the theoretical foundation and tools for analyzing PSRL in complex settings.

Weitere Links

https://doi.org/10.48550/arXiv.2603.08287

2603.08287v1.pdf (pdf, 1 MB )