Hi Fabio,

If the you are using an optimal policy, and it is deterministic, and the environment is also deterministic, then you will have zero variance and bias (unless you are using a function approximator with limited representation capacity).

For the actual learning process however, this is almost never the case, since some amount of exploration is required in order to improve the policy.

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store