Hi Abhishek,

I am actually using a modified version of the update where the target network is updated at every step, but only a small amount (determined by a variable tau. This variable is typically set to something like 0.001, such that 1000 updates is roughly the equivalent of updating the network fully every 1000 steps. By smoothly updating the target network it prevents potential instability that can arise from a sudden large change in the target network.

Hope that clarifies things.

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store