Hi Berkmeister,

The gradBuffer is actually being used not just for RMSProp, but to collect the gradients together before applying them to the policy. This is done in order to reduce the variance of the gradients before they are applied. If we used a high-variance set of gradients the policy might become destabilized. By collecting gradients from multiple runs we mitigate the instability of the policy.

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

