Hi Gaurav,

A3C can be used with an LSTM cell (as described here) which handles partial observability in the environment. The advantage of A3C over DRQN is that it is more resource-efficient, since it can be run on multiple cores of a single machine, and doesn’t require a large amount of RAM to store the replay buffer. The actor-critic aspect also provides more accurate updates to the policy than a DQN update might.

That being said, because A3C is an on-policy algorithm it cannot explore the state-space as efficiently as DQN, so there are some trade-offs between the two algorithms.

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store