Hi Markel,

The W is run for the purpose of obtaining the Q-table at every step. While it isn’t explicitly used, you can look at these weights and determine the current Q values for each of the states and actions that the agent has available. Feel free to remove it if you like though.

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store