1 min readMar 14, 2017
Given the way that the value is calculated, it never provides a 0-gradient to the policy. An optimal state value does correspond to the expected future discounted reward from a specific state.
Given the way that the value is calculated, it never provides a 0-gradient to the policy. An optimal state value does correspond to the expected future discounted reward from a specific state.
Interested in artificial intelligence, neuroscience, philosophy, psychedelics, and meditation. http://arthurjuliani.com/