Hi Parth,

There are a number of hyperparameters I would suggest adjusting to try getting it to learn Pong. In their DQN paper, DeepMind suggest a learning rate of 0.00025, an experience buffer size of 1 million, 50,000 random actions before network training begins, 1 million annealing steps, and a tau that is closer to 0.001 rather than 0.1. As you have it now, the agent doesn’t obtain a diverse enough set of experiences in order to begin to learn a robust policy. It also updates itself too strongly.

Hopefully those setting changes allow you to find more success!

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store