Hi Akshay,
Thanks for reading the post! In the case of MountainCar, the simple algorithms I provide here probably won’t help. The issue is that the state space is much more complex than a grid-world.
You will likely need to utilize a policy-gradient method, or more complex Q algorithm such as DQN. I would suggest starting with a policy method however as it is simpler and works well on other similar control problems like CartPole.
I would recommend checking out the next few articles in my series, and then trying to apply https://github.com/awjuliani/DeepRL-Agents/blob/master/Vanilla-Policy.ipynb to the problem. You may need to adjust a few parameters first though to tune it to MountainCar. I haven’t actually worked with the environment myself, so I am unsure of what it’s particularities may be.