Hi Berkmeister,

The gradBuffer is actually being used not just for RMSProp, but to collect the gradients together before applying them to the policy. This is done in order to reduce the variance of the gradients before they are applied. If we used a high-variance set of gradients the policy might become destabilized. By collecting gradients from multiple runs we mitigate the instability of the policy.

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store