In mdp.P, the following holds according to the comments:
for every (state,action), mdp.P[state][action] is a list of (probability,next_state,reward) tuples.
is it okay for this list to contain duplicate state, for example:
P[0][0] = [(0.3333333333333333, 0, 0.0), (0.3333333333333333, 0, 0.0), (0.3333333333333333, 4, 0.0)]
if so - what should we do with these duplicates?
Thanks