can you please upload a solution for the exam?

Thanks

]]>Can the Moed A exam and its solution be uploaded?

Thanks

]]>In the LQR lecture we defined controllability as a sufficient condition for solving the ARE equations.

Then we defined stability which basically tell us if our system will explode or not depending on the eigenvalues of the proposed optimal solution.

Can someone explain how are the two related ?

We can reach every state but then cannot stay there? we will try to reach it but the system will be very unstable?

Also it says that a good system is a system where the eigenvalues are lower than 1 hence x_t goes to 0, why is it good?

We want x_t to be a specific state and not zero.

Thanks!

]]>In the exams you published there are questions that provide traces and ask us to compute the V or Q function via some method.

My question is, how do we know if the traces were produced via on-policy or by off-policy?

This changes dramatically the computation of the estimated Q/V function.

Thanks

]]>Thanks

]]>When do we update the entry Q(5,5)?

Since it is the target (room 5), it seems it can be updated only when the episod is starting with that state?

If it would have been stayed zero, we would have never reached Q values greater than 100

It holds since in the first round we start by pulling each arm one time.

Shouldn't we add this to the regret?

Hence the regret should have an extra term: sum i 1 to n of delta_i ]]>

In equation 8, shouldn't we write t instead of t square? indeed, in the last expression, the t square is replaced by t.

in the explanation before (16), delta(i) = mu1 - mu2

in equation 22, should we assume alpha is greather than 1?

Toda,

Rafi

The entire point of function approximation was to have a compact representation of the state space, hence if we initialize a weight vector for each state in the space

then we are doomed, no?

Why do we need a new weight vector? why is it reasonable? ]]>

it seems that we add the gradient to the weights instead of subtracting it.

Thanks

]]>recitation 13, 13-1 part a - there is no 'return' action from state 1 to itself, index should start from 2.

recitation 11, page 11-4 , the decreasing curve is open-left and the increasing curve is open right, the x-axis is b(s(l)) state (tiger on left door).

page 11-5, the second equation : 0.15 + 0.7 * b(s(l))

Toda

Rafi

For those of us that did not attend the last class/recitation is there any special remarks we need to know?

Can we bring formula pages ?

Any information would be great,

Thanks!

in page 1-1, 1-2, 1-5 - mainly a matter of notation and being consistent with the class, the index of R/r should start from t and not from t + 1

in page 1-3, last equation at the bottom, in the second expression there is i believe a missing gamma

1-4, 1-5, perhaps it's worth mentioning as a footnote or something… V(lambda, t) is the same as G (lambda, t) (as presented in the first page) and V(n, t) is the same as G(n, t)

thanks

Rafi

in 5-6 , mu is a fuction of teta and not of sigma ; in the second gradient, it should be written deviation (sigma) and not average

thanks

Rafi

In the part about finite differences methods (scribe 9, page 4) it is said in the last paragraph that we do not have the value of J(theta).

Can you please explain why it is possible to obtain the values of z which are J(theta + delta * u_i) but not the value of J(theta)?

Thanks

]]>In lecture 4 - slide 57 it is said that policy iteration takes O(|A|*|S|^2 + |S|^3).

Can you please explain why that is the case? what is the step that take |A|*|S|^2 and what is the part that takes |S|^3?

Thanks!

]]>I have 2 suggested ways for you:

1. Using google colab

You can copy my colab notebook: https://colab.research.google.com/drive/1JRS6xTvBKGL74mJP6wyj1RlBhU7-4A9O and transfer your code to the new notebook.

It requires adjusting the code and takes some babysitting (it only lasts 12H) but it's free available GPU to get you started quickly.

2. Using virtual environment on another server

Some of you may have access to more storage on the university's machine, this is for you (since it takes about 1.5gb and regular disk quota is 1gb).

Just follow the instructions below:

- ssh savant/rack-gamir-g04/5/6 (from nova)
- bash
- cd <directory for your env>
- # create the virtual environment once
- virtualenv -p /usr/local/lib/anaconda3-5.1.0/bin/python dqn_env
- # NOTE: you need to activate venv each time before running
- source dqn_env/bin/activate

- # install all of the requirements
- pip install "gym[atari]"==0.9.5
- pip install opencv-python
- pip install torch
- pip install matplotlib

- # run project from within the environment
- python main.py

- # sanity check from within the environment
- python
- import gym
- print(gym.version) # make sure it says 0.9.5
- gym.make('PongNoFrameskip-v4') # make sure this command succeeds

You wrote in the project that to install ffmpeg we can use homebrew or apt-get.

the problem is that I use windows and both of those options (homebrew and apat-get) are only for linux (If I understood correctly).

I tried unsuccessfully to install ffmpeg from some other places I found online.

is there an option not to use ffmpeg? what would you recommend to do in my case with windows?

Thanks

]]>