Hi Gabriel,

What is important about the reverse accumulation is that the reward discounting happens in reverse (which the discount function takes care of). The gradients are all applied at the same time, so it doesn’t matter what order each example is processed in.

Hope that clears it up for you.

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store