A3C can be used with an LSTM cell (as described here) which handles partial observability in the environment. The advantage of A3C over DRQN is that it is more resource-efficient, since it can be run on multiple cores of a single machine, and doesn’t require a large amount of RAM to store the replay buffer. The actor-critic aspect also provides more accurate updates to the policy than a DQN update might.

That being said, because A3C is an on-policy algorithm it cannot explore the state-space as efficiently as DQN, so there are some trade-offs between the two algorithms.

