Hi Parth,

There are a number of hyperparameters I would suggest adjusting to try getting it to learn Pong. In their DQN paper, DeepMind suggest a learning rate of 0.00025, an experience buffer size of 1 million, 50,000 random actions before network training begins, 1 million annealing steps, and a tau that is closer to 0.001 rather than 0.1. As you have it now, the agent doesn’t obtain a diverse enough set of experiences in order to begin to learn a robust policy. It also updates itself too strongly.

Hopefully those setting changes allow you to find more success!

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

