Given the way that the value is calculated, it never provides a 0-gradient to the policy.

Kevin

1 min readMar 14, 2017

Given the way that the value is calculated, it never provides a 0-gradient to the policy. An optimal state value does correspond to the expected future discounted reward from a specific state.

Written by Arthur Juliani

13.5K Followers

Interested in artificial intelligence, neuroscience, philosophy, psychedelics, and meditation. http://arthurjuliani.com/

Help
Status
About
Careers
Blog
Privacy
Terms
Text to speech
Teams