Hi Marc-Philippe,

The equation given in the body of the article isn’t an update equation, it is rather just stating a property of Q functions, which is that the value of a specific state and action corresponds to the immediate reward plus the discounted maximum expected return in the following state.

The update rule used in the code is indeed a TD update rule which uses the TD error to adjust the current Q estimate toward a more accurate Q value for the given environment.

All of this is indeed applicable to look-up table implementations, so long as you have small finite state and action spaces.

Hope that answers your questions.

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store