Arthur Juliani
1 min readMar 14, 2017

--

Given the way that the value is calculated, it never provides a 0-gradient to the policy. An optimal state value does correspond to the expected future discounted reward from a specific state.

--

--

Arthur Juliani

Interested in artificial intelligence, neuroscience, philosophy, psychedelics, and meditation. http://arthurjuliani.com/