Hi Sung,

In order for training to work best, the gradient would need to be computed for each layer. It is worth remembering that each layer’s gradient depends on the gradient of the layer “above” it, and that they aren’t computed by themselves.

Luckily, Tensorflow abstracts away most of this from you. By simply specifying a loss that utilizes output from the “top” layer, and computing the gradients with respect to all your variables using that loss, you don’t need to keep track of the gradients yourself.

The reason I don’t use trainer.minimize and the loss directly in this example is that in order to ensure we are updating our policy in a stable fashion, we need to sum the gradients over multiple episodes. In order to do this, we compute the gradient for each episode, and then apply them all at once after a certain number of intervals.

Hope this helps!

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store