Hello,

can you please upload a solution for the exam?

Thanks

- Instructor
- Prof. Yishay Mansour
- Teaching Assistant
- Lee Cohen

- Exam: July 9
^{th}, 2019 - Moed B: Sept. 1
^{st}, 2019

Recent Forum Posts

Hello,

can you please upload a solution for the exam?

Thanks

Hi,

Can the Moed A exam and its solution be uploaded?

Thanks

Stability, implies that the optimal control has finite cost, but does to guarantee that we can reach any state. (For example, we always move to x=0)

We like to reach x=0, and should think that we "normalize" the system so that the origin (x=0) is the desired state of operation.

In question 2 Moed B you are told that the trajectories are using pi, and asked to run Monte Carlo to learn V of pi.

Can someone please explain the answers for question 4d in last year's exams?

Thanks!

Hi all,

Question 1 in hw4 is (a slightly easier version of) question 5 here: https://ece.iisc.ac.in/~aditya/E1245_Online_Prediction_Learning_F2014/final_exam_full.pdf

Hi,

In the LQR lecture we defined controllability as a sufficient condition for solving the ARE equations.

Then we defined stability which basically tell us if our system will explode or not depending on the eigenvalues of the proposed optimal solution.

Can someone explain how are the two related ?

We can reach every state but then cannot stay there? we will try to reach it but the system will be very unstable?

Also it says that a good system is a system where the eigenvalues are lower than 1 hence x_t goes to 0, why is it good?

We want x_t to be a specific state and not zero.

Thanks!

Can you post a solution from a student that received 100?

gsdaf (guest) 07 Jul 2019 13:59

in discussion Discussions / General » Off/on policy evaluation in exam

in discussion Discussions / General » Off/on policy evaluation in exam

Hi,

In the exams you published there are questions that provide traces and ask us to compute the V or Q function via some method.

My question is, how do we know if the traces were produced via on-policy or by off-policy?

This changes dramatically the computation of the estimated Q/V function.

Thanks

rafi levy (guest) 07 Jul 2019 07:38

in discussion Discussions / General » small question/clarification on recitation 5

in discussion Discussions / General » small question/clarification on recitation 5

What does it mean "s(1,1) - action is chosen at step 4 & 5" on page 5-6, rec. 5?

but if alpha is positive but smaller than 1, wouldnt the integral diverge?

Lee Cohen 06 Jul 2019 15:25

in discussion Discussions / General » Recitation 8 - new weights vector, why?

in discussion Discussions / General » Recitation 8 - new weights vector, why?

'wait' is an action, the state is a function of the chemicals' concentrations.

In exercise 2 we handle the 'wait' action for the first time, so when we estimate the Q value we should consider different weights than the ones that were used for 'harvest' action.

Thanks Dan.

equation 22- it's not required to assume that alpha>1.

**bound** which is already logarithmic in T (i.e., O(logT+c)= O(logT) for any $c\in \mathbb{R}$)

Can we please get a solution to the homework.

Thanks

sorry, it is recitation 6

Rec. 6, ex.1

When do we update the entry Q(5,5)?

Since it is the target (room 5), it seems it can be updated only when the episod is starting with that state?

If it would have been stayed zero, we would have never reached Q values greater than 100

recitation 10 (guest) 06 Jul 2019 08:24

in discussion Discussions / General » UCB analysis exploration cost

in discussion Discussions / General » UCB analysis exploration cost

In the analysis of the UCB bound there is an assumption that Ti is greater than 1.

It holds since in the first round we start by pulling each arm one time.

Shouldn't we add this to the regret?

Hence the regret should have an extra term: sum i 1 to n of delta_i

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License