Hi Lucas,

Getting ‘nan’ for the loss means that the values in the network have become either too big or too small to calculate anymore. In that case, you should reduce the learning rate.

In order to properly get the gradients for the loss, you will have to do something a little more complex than using a matrix multiply to replace the parts of the output layer we don’t want. You will need to get the values by their index.

indexes = tf.range(0, tf.shape(output_layer)[0]) * tf.shape(output_layer)[1] + action_holder
responsible_outputs = tf.gather(tf.reshape(output_layer, [-1]), indexes)

This should preserve the gradient you want when you make the update.

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store