Recent Forum Posts - Reinforcement Learning

Exam solution

guest (guest) 15 Jul 2019 15:34
in discussion Discussions / General » Exam solution

Hello,
can you please upload a solution for the exam?

Thanks

Exam solution by

guest (guest), 15 Jul 2019 15:34

Exam Moed A Solution

Guest (guest) 09 Jul 2019 18:11
in discussion Discussions / General » Exam Moed A Solution

Hi,

Can the Moed A exam and its solution be uploaded?

Thanks

Exam Moed A Solution by

Guest (guest), 09 Jul 2019 18:11

Re: Controllability and stability

mansour 08 Jul 2019 17:10
in discussion Discussions / General » Controllability and stability

Controllability implies that we can reach any state, but does not guarantee finite cost.
Stability, implies that the optimal control has finite cost, but does to guarantee that we can reach any state. (For example, we always move to x=0)
We like to reach x=0, and should think that we "normalize" the system so that the origin (x=0) is the desired state of operation.

Re: Controllability and stability by

mansour, 08 Jul 2019 17:10

Re: Off/on policy evaluation in exam

mansour 08 Jul 2019 17:02
in discussion Discussions / General » Off/on policy evaluation in exam

In question 2 Moed A you are asked about Q-learning, so it does not matter which policy generated them.
In question 2 Moed B you are told that the trajectories are using pi, and asked to run Monte Carlo to learn V of pi.

Re: Off/on policy evaluation in exam by

mansour, 08 Jul 2019 17:02

Q4d moed a + b

guest (guest) 08 Jul 2019 13:15
in discussion Discussions / Past Exams » Q4d moed a + b

Can someone please explain the answers for question 4d in last year's exams?
Thanks!

Q4d moed a + b by

guest (guest), 08 Jul 2019 13:15

HW4 Q1 solution

Lee Cohen 08 Jul 2019 11:29
in discussion News / Course News » HW4 Q1 solution

Hi all,

Question 1 in hw4 is (a slightly easier version of) question 5 here: https://ece.iisc.ac.in/~aditya/E1245_Online_Prediction_Learning_F2014/final_exam_full.pdf

HW4 Q1 solution by

Lee Cohen, 08 Jul 2019 11:29

Controllability and stability

asdf (guest) 08 Jul 2019 08:35
in discussion Discussions / General » Controllability and stability

Hi,

In the LQR lecture we defined controllability as a sufficient condition for solving the ARE equations.
Then we defined stability which basically tell us if our system will explode or not depending on the eigenvalues of the proposed optimal solution.
Can someone explain how are the two related ?
We can reach every state but then cannot stay there? we will try to reach it but the system will be very unstable?
Also it says that a good system is a system where the eigenvalues are lower than 1 hence x_t goes to 0, why is it good?
We want x_t to be a specific state and not zero.

Thanks!

Controllability and stability by

asdf (guest), 08 Jul 2019 08:35

gsdaf (guest) 07 Jul 2019 14:00
in discussion Discussions / General » HW solutions

Can you post a solution from a student that received 100?

by

gsdaf (guest), 07 Jul 2019 14:00

Off/on policy evaluation in exam

gsdaf (guest) 07 Jul 2019 13:59
in discussion Discussions / General » Off/on policy evaluation in exam

Hi,

In the exams you published there are questions that provide traces and ask us to compute the V or Q function via some method.
My question is, how do we know if the traces were produced via on-policy or by off-policy?
This changes dramatically the computation of the estimated Q/V function.

Thanks

Off/on policy evaluation in exam by

gsdaf (guest), 07 Jul 2019 13:59

small question/clarification on recitation 5

rafi levy (guest) 07 Jul 2019 07:38
in discussion Discussions / General » small question/clarification on recitation 5

What does it mean "s(1,1) - action is chosen at step 4 & 5" on page 5-6, rec. 5?

small question/clarification on recitation 5 by

rafi levy (guest), 07 Jul 2019 07:38

rafi levy (guest) 06 Jul 2019 17:18
in discussion Discussions / General » recitation 10

but if alpha is positive but smaller than 1, wouldnt the integral diverge?

by

rafi levy (guest), 06 Jul 2019 17:18

Re: HW solutions

Lee Cohen 06 Jul 2019 15:27
in discussion Discussions / General » HW solutions

We don't publish solutions. If there's something which is not clear to you after your exercises have been graded feel free to ask (you can send me an email if you prefer).

Re: HW solutions by

Lee Cohen, 06 Jul 2019 15:27

Re: Recitation 8 - new weights vector, why?

Lee Cohen 06 Jul 2019 15:25
in discussion Discussions / General » Recitation 8 - new weights vector, why?

'wait' is an action, the state is a function of the chemicals' concentrations.
In exercise 2 we handle the 'wait' action for the first time, so when we estimate the Q value we should consider different weights than the ones that were used for 'harvest' action.

Re: Recitation 8 - new weights vector, why? by

Lee Cohen, 06 Jul 2019 15:25

Re: recitation 10

Lee Cohen 06 Jul 2019 15:24
in discussion Discussions / General » recitation 10

Thanks Dan.
equation 22- it's not required to assume that alpha>1.

Re: recitation 10 by

Lee Cohen, 06 Jul 2019 15:24

RE

Lee Cohen 06 Jul 2019 15:24
in discussion Discussions / General » recitation 5, ex.1

In this question, room 5 loops back to itself with a reward of 100 (or you can go to rooms 1/4), the agent can start from room 5 as well.

RE by

Lee Cohen, 06 Jul 2019 15:24

Re: UCB analysis exploration cost

Lee Cohen 06 Jul 2019 15:24
in discussion Discussions / General » UCB analysis exploration cost

Technically you're right, but since sum i 1 to n of delta_i is a fixed term it doesn't increase the bound which is already logarithmic in T (i.e., O(logT+c)= O(logT) for any $c\in \mathbb{R}$ )

Re: UCB analysis exploration cost by

Lee Cohen, 06 Jul 2019 15:24

HW solutions

adsavf (guest) 06 Jul 2019 12:55
in discussion Discussions / General » HW solutions

Can we please get a solution to the homework.

Thanks

HW solutions by

adsavf (guest), 06 Jul 2019 12:55

rafi levy (guest) 06 Jul 2019 08:37
in discussion Discussions / General » recitation 5, ex.1

sorry, it is recitation 6

by

rafi levy (guest), 06 Jul 2019 08:37

recitation 5, ex.1

rafi levy (guest) 06 Jul 2019 08:36
in discussion Discussions / General » recitation 5, ex.1

Rec. 6, ex.1

When do we update the entry Q(5,5)?
Since it is the target (room 5), it seems it can be updated only when the episod is starting with that state?
If it would have been stayed zero, we would have never reached Q values greater than 100

recitation 5, ex.1 by

rafi levy (guest), 06 Jul 2019 08:36

UCB analysis exploration cost

recitation 10 (guest) 06 Jul 2019 08:24
in discussion Discussions / General » UCB analysis exploration cost

In the analysis of the UCB bound there is an assumption that Ti is greater than 1.
It holds since in the first round we start by pulling each arm one time.
Shouldn't we add this to the regret?
Hence the regret should have an extra term: sum i 1 to n of delta_i

UCB analysis exploration cost by

recitation 10 (guest), 06 Jul 2019 08:24

Reinforcement Learning

Spring 2019, Tel-Aviv University

Staff

Exam Dates