Hi Nick,

np.identity(16)[s:s+1] is actually just a way of creating a one-hot encoding of the state space. The environment represents the position in the grid-world as a number between 0 and 15. Since each state is independent of the others, we need to use a one-hot version instead, and that is what the line of code does. The [s:s+1] is simply a way of indexing the correct column we want from the identity matrix.

Hope that clarifies things.

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store