Sat, 06 Jul 2019 15:24:20 +0000Lee Cohen5195429
In this question, room 5 loops back to itself with a reward of 100 (or you can go to rooms 1/4), the agent can start from room 5 as well.
Sat, 06 Jul 2019 08:37:44 +0000rafi levy
sorry, it is recitation 6
Sat, 06 Jul 2019 08:36:38 +0000rafi levy
Rec. 6, ex.1

When do we update the entry Q(5,5)? Since it is the target (room 5), it seems it can be updated only when the episod is starting with that state? If it would have been stayed zero, we would have never reached Q values greater than 100