Hi Marc-Philippe,

The equation given in the body of the article isn’t an update equation, it is rather just stating a property of Q functions, which is that the value of a specific state and action corresponds to the immediate reward plus the discounted maximum expected return in the following state.

The update rule used in the code is indeed a TD update rule which uses the TD error to adjust the current Q estimate toward a more accurate Q value for the given environment.

All of this is indeed applicable to look-up table implementations, so long as you have small finite state and action spaces.

Hope that answers your questions.

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

