Given the way that the value is calculated, it never provides a 0-gradient to the policy. An optimal state value does correspond to the expected future discounted reward from a specific state.
Given the way that the value is calculated, it never provides a 0-gradient to the policy. An optimal state value does correspond to the expected future discounted reward from a specific state.
PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.
PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.