Hi Yuanpu,

Estimating the expected discounted return is actually the same as estimating the value! The value is then used to provide an advantage measure to the policy in order to update the policy towards maximizing for expected discounted return. We use the value function as a means of providing a more stable and generalizable update than using the returns directly as in a purely policy-based approach.

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store