Hi Richa,

There should be no reason that optimizing for entropy is any more likely to lead to catastrophic forgetting compared to optimizing for reward. It could also be argued that in the case of a series of learned tasks, simply optimizing for the reward in each task would be more likely to lead to catastrophic forgetting, since then the policy would overfit to each of the sub-tasks.

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store