Recent Forum Posts
From categories:
page 1123...next »
Exam solution
guest (guest) 15 Jul 2019 15:34
in discussion Discussions / General » Exam solution

Hello,
can you please upload a solution for the exam?

Thanks

Exam solution by guest (guest), 15 Jul 2019 15:34
Exam Moed A Solution
Guest (guest) 09 Jul 2019 18:11
in discussion Discussions / General » Exam Moed A Solution

Hi,

Can the Moed A exam and its solution be uploaded?

Thanks

Exam Moed A Solution by Guest (guest), 09 Jul 2019 18:11

Controllability implies that we can reach any state, but does not guarantee finite cost.
Stability, implies that the optimal control has finite cost, but does to guarantee that we can reach any state. (For example, we always move to x=0)
We like to reach x=0, and should think that we "normalize" the system so that the origin (x=0) is the desired state of operation.

In question 2 Moed A you are asked about Q-learning, so it does not matter which policy generated them.
In question 2 Moed B you are told that the trajectories are using pi, and asked to run Monte Carlo to learn V of pi.

Q4d moed a + b
guest (guest) 08 Jul 2019 13:15
in discussion Discussions / Past Exams » Q4d moed a + b

Can someone please explain the answers for question 4d in last year's exams?
Thanks!

Q4d moed a + b by guest (guest), 08 Jul 2019 13:15
HW4 Q1 solution
Lee CohenLee Cohen 08 Jul 2019 11:29
in discussion News / Course News » HW4 Q1 solution

Hi all,

Question 1 in hw4 is (a slightly easier version of) question 5 here: https://ece.iisc.ac.in/~aditya/E1245_Online_Prediction_Learning_F2014/final_exam_full.pdf

HW4 Q1 solution by Lee CohenLee Cohen, 08 Jul 2019 11:29

Hi,

In the LQR lecture we defined controllability as a sufficient condition for solving the ARE equations.
Then we defined stability which basically tell us if our system will explode or not depending on the eigenvalues of the proposed optimal solution.
Can someone explain how are the two related ?
We can reach every state but then cannot stay there? we will try to reach it but the system will be very unstable?
Also it says that a good system is a system where the eigenvalues are lower than 1 hence x_t goes to 0, why is it good?
We want x_t to be a specific state and not zero.

Thanks!

Controllability and stability by asdf (guest), 08 Jul 2019 08:35
gsdaf (guest) 07 Jul 2019 14:00
in discussion Discussions / General » HW solutions

Can you post a solution from a student that received 100?

by gsdaf (guest), 07 Jul 2019 14:00

Hi,

In the exams you published there are questions that provide traces and ask us to compute the V or Q function via some method.
My question is, how do we know if the traces were produced via on-policy or by off-policy?
This changes dramatically the computation of the estimated Q/V function.

Thanks

Off/on policy evaluation in exam by gsdaf (guest), 07 Jul 2019 13:59

What does it mean "s(1,1) - action is chosen at step 4 & 5" on page 5-6, rec. 5?

small question/clarification on recitation 5 by rafi levy (guest), 07 Jul 2019 07:38
rafi levy (guest) 06 Jul 2019 17:18
in discussion Discussions / General » recitation 10

but if alpha is positive but smaller than 1, wouldnt the integral diverge?

by rafi levy (guest), 06 Jul 2019 17:18
Re: HW solutions
Lee CohenLee Cohen 06 Jul 2019 15:27
in discussion Discussions / General » HW solutions

We don't publish solutions. If there's something which is not clear to you after your exercises have been graded feel free to ask (you can send me an email if you prefer).

Re: HW solutions by Lee CohenLee Cohen, 06 Jul 2019 15:27

'wait' is an action, the state is a function of the chemicals' concentrations.
In exercise 2 we handle the 'wait' action for the first time, so when we estimate the Q value we should consider different weights than the ones that were used for 'harvest' action.

Re: recitation 10
Lee CohenLee Cohen 06 Jul 2019 15:24
in discussion Discussions / General » recitation 10

Thanks Dan.
equation 22- it's not required to assume that alpha>1.

Re: recitation 10 by Lee CohenLee Cohen, 06 Jul 2019 15:24
RE
Lee CohenLee Cohen 06 Jul 2019 15:24
in discussion Discussions / General » recitation 5, ex.1

In this question, room 5 loops back to itself with a reward of 100 (or you can go to rooms 1/4), the agent can start from room 5 as well.

RE by Lee CohenLee Cohen, 06 Jul 2019 15:24

Technically you're right, but since sum i 1 to n of delta_i is a fixed term it doesn't increase the bound which is already logarithmic in T (i.e., O(logT+c)= O(logT) for any $c\in \mathbb{R}$)

HW solutions
adsavf (guest) 06 Jul 2019 12:55
in discussion Discussions / General » HW solutions

Can we please get a solution to the homework.

Thanks

HW solutions by adsavf (guest), 06 Jul 2019 12:55
rafi levy (guest) 06 Jul 2019 08:37
in discussion Discussions / General » recitation 5, ex.1

sorry, it is recitation 6

by rafi levy (guest), 06 Jul 2019 08:37
recitation 5, ex.1
rafi levy (guest) 06 Jul 2019 08:36
in discussion Discussions / General » recitation 5, ex.1

Rec. 6, ex.1

When do we update the entry Q(5,5)?
Since it is the target (room 5), it seems it can be updated only when the episod is starting with that state?
If it would have been stayed zero, we would have never reached Q values greater than 100

recitation 5, ex.1 by rafi levy (guest), 06 Jul 2019 08:36
UCB analysis exploration cost
recitation 10 (guest) 06 Jul 2019 08:24
in discussion Discussions / General » UCB analysis exploration cost

In the analysis of the UCB bound there is an assumption that Ti is greater than 1.
It holds since in the first round we start by pulling each arm one time.
Shouldn't we add this to the regret?
Hence the regret should have an extra term: sum i 1 to n of delta_i

UCB analysis exploration cost by recitation 10 (guest), 06 Jul 2019 08:24
page 1123...next »
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License