Hi Arun,

You are correct to observe that using a simple Q-learning algorithm on CartPole will fail. Due to the nature of the state space in CartPole it is very difficult for a basic Q algorithm to solve it. In fact, the Q-learning algorithm described here is almost never used for large or continuous state/action spaces. Instead DQN, with it’s augmentations to improve robustness is used. Or a policy gradient method as you mentioned.

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store