Publication
Polynomial Regret Concentration of UCB for Non-Deterministic State Transitions
Can Cömer; Jannis Blüml; Cedric Derstroff; Kristian Kersting
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2502.06900, Pages 1-10, Computing Research Repository, 2025.
Abstract
Monte Carlo Tree Search (MCTS) has proven effective in
solving decision-making problems in perfect information set-
tings. However, its application to stochastic and imperfect in-
formation domains remains limited. This paper extends the
theoretical framework of MCTS to stochastic domains by
addressing non-deterministic state transitions, where actions
lead to probabilistic outcomes. Specifically, building on the
work of Shah, Xie, and Xu (2022), we derive polynomial re-
gret concentration bounds for the Upper Confidence Bound
algorithm in multi-armed bandit problems with stochastic
transitions, offering improved theoretical guarantees. Our pri-
mary contribution is proving that these bounds also apply
to non-deterministic environments, ensuring robust perfor-
mance in stochastic settings. This broadens the applicability
of MCTS to real-world decision-making problems with prob-
abilistic outcomes, such as in autonomous systems and finan-
cial decision-making.
