Hi Hady,

Thanks for the question. This is definitely a very common practice in Reinforcement Learning. The idea is that the agent needs to explore in order to get a good understanding of the reward dynamics of the environment it is in. By sometimes acting randomly, it can end up in states it may know less about, and can learn from them to obtain a more optimal policy.

Research Scientist. Interested in Artificial Intelligence, Neuroscience, Philosophy, and Literature.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store