Sat, 06 Jul 2019 15:24:09 +0000 Lee Cohen
Technically you're right, but since sum i 1 to n of delta_i is a fixed term it doesn't increase the bound which is already logarithmic in T (i.e., O(logT+c)= O(logT) for any $c\in \mathbb{R}$)
Sat, 06 Jul 2019 08:24:30 +0000
In the analysis of the UCB bound there is an assumption that Ti is greater than 1. It holds since in the first round we start by pulling each arm one time. Shouldn't we add this to the regret? Hence the regret should have an extra term: sum i 1 to n of delta_i
