In order for training to work best, the gradient would need to be computed for each layer. It is worth remembering that each layer’s gradient depends on the gradient of the layer “above” it, and that they aren’t computed by themselves.
Luckily, Tensorflow abstracts away most of this from you. By simply specifying a loss that utilizes output from the “top” layer, and computing the gradients with respect to all your variables using that loss, you don’t need to keep track of the gradients yourself.
The reason I don’t use
trainer.minimize and the loss directly in this example is that in order to ensure we are updating our policy in a stable fashion, we need to sum the gradients over multiple episodes. In order to do this, we compute the gradient for each episode, and then apply them all at once after a certain number of intervals.
Hope this helps!