Skip to main content Skip to main navigation

Publikation

Polynomial Regret Concentration of UCB for Non-Deterministic State Transitions

Can Cömer; Jannis Blüml; Cedric Derstroff; Kristian Kersting
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2502.06900, Pages 1-10, Computing Research Repository, 2025.

Zusammenfassung

Monte Carlo Tree Search (MCTS) has proven effective in solving decision-making problems in perfect information set- tings. However, its application to stochastic and imperfect in- formation domains remains limited. This paper extends the theoretical framework of MCTS to stochastic domains by addressing non-deterministic state transitions, where actions lead to probabilistic outcomes. Specifically, building on the work of Shah, Xie, and Xu (2022), we derive polynomial re- gret concentration bounds for the Upper Confidence Bound algorithm in multi-armed bandit problems with stochastic transitions, offering improved theoretical guarantees. Our pri- mary contribution is proving that these bounds also apply to non-deterministic environments, ensuring robust perfor- mance in stochastic settings. This broadens the applicability of MCTS to real-world decision-making problems with prob- abilistic outcomes, such as in autonomous systems and finan- cial decision-making.

Weitere Links