Hi Slava,

Which to use depends on the nature of the problem. One isn’t necessarily “worse” or “better.” In Fact one of Andrej’s colleagues at OpenAI recently proved that the two methods have a strong theoretical connection, and in many cases are computing the same thing. https://arxiv.org/abs/1704.06440

From a practical perspective, DQN is more likely to work in cases with more separable state and action spaces, whereas PG works better with more continuous state and action spaces. PG also updates faster because it takes advantage of full monte-carlo backups, but is less capable of exploring the state space fully, since it relies on on-policy updates. On the flip side DQN uses one-step backups, however it’s off-policy nature allows for greater exploration.

I hope that gives some additional intuition.

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store