Hi Andy,

I think the point of confusion is that I am actually using an update procedure from this paper: https://arxiv.org/pdf/1509.02971.pdf, not the original paper my Mnih et al. In my implementation the target network is slowly updated toward the values of the primary network at every update step (of the main network), instead of being updated infrequently, but all at once like in the original paper. The authors of the newer paper found that it improved training stability, and I did the same in my own experiments.

I hope that clears things up.

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store