Hello,
can you please upload a solution for the exam?
Thanks
Hello,
can you please upload a solution for the exam?
Thanks
Hi,
Can the Moed A exam and its solution be uploaded?
Thanks
Controllability implies that we can reach any state, but does not guarantee finite cost.
Stability, implies that the optimal control has finite cost, but does to guarantee that we can reach any state. (For example, we always move to x=0)
We like to reach x=0, and should think that we "normalize" the system so that the origin (x=0) is the desired state of operation.
In question 2 Moed A you are asked about Q-learning, so it does not matter which policy generated them.
In question 2 Moed B you are told that the trajectories are using pi, and asked to run Monte Carlo to learn V of pi.
Can someone please explain the answers for question 4d in last year's exams?
Thanks!
Hi all,
Question 1 in hw4 is (a slightly easier version of) question 5 here: https://ece.iisc.ac.in/~aditya/E1245_Online_Prediction_Learning_F2014/final_exam_full.pdf
Hi,
In the LQR lecture we defined controllability as a sufficient condition for solving the ARE equations.
Then we defined stability which basically tell us if our system will explode or not depending on the eigenvalues of the proposed optimal solution.
Can someone explain how are the two related ?
We can reach every state but then cannot stay there? we will try to reach it but the system will be very unstable?
Also it says that a good system is a system where the eigenvalues are lower than 1 hence x_t goes to 0, why is it good?
We want x_t to be a specific state and not zero.
Thanks!
Can you post a solution from a student that received 100?
Hi,
In the exams you published there are questions that provide traces and ask us to compute the V or Q function via some method.
My question is, how do we know if the traces were produced via on-policy or by off-policy?
This changes dramatically the computation of the estimated Q/V function.
Thanks
What does it mean "s(1,1) - action is chosen at step 4 & 5" on page 5-6, rec. 5?
but if alpha is positive but smaller than 1, wouldnt the integral diverge?
We don't publish solutions. If there's something which is not clear to you after your exercises have been graded feel free to ask (you can send me an email if you prefer).
'wait' is an action, the state is a function of the chemicals' concentrations.
In exercise 2 we handle the 'wait' action for the first time, so when we estimate the Q value we should consider different weights than the ones that were used for 'harvest' action.
Thanks Dan.
equation 22- it's not required to assume that alpha>1.
In this question, room 5 loops back to itself with a reward of 100 (or you can go to rooms 1/4), the agent can start from room 5 as well.
Technically you're right, but since sum i 1 to n of delta_i is a fixed term it doesn't increase the bound which is already logarithmic in T (i.e., O(logT+c)= O(logT) for any $c\in \mathbb{R}$)
Can we please get a solution to the homework.
Thanks
sorry, it is recitation 6
Rec. 6, ex.1
When do we update the entry Q(5,5)?
Since it is the target (room 5), it seems it can be updated only when the episod is starting with that state?
If it would have been stayed zero, we would have never reached Q values greater than 100
In the analysis of the UCB bound there is an assumption that Ti is greater than 1.
It holds since in the first round we start by pulling each arm one time.
Shouldn't we add this to the regret?
Hence the regret should have an extra term: sum i 1 to n of delta_i