Hi Adam,

The truth is that RL training is somewhat different than general deep learning training (or general ML training). The difference here is that we are utilizing the reward measure, rather than a proxy loss function when determining performance. As such there are fewer guarantees related to typical early stopping criteria. We also aren’t dealing with the usual bias/variance trade-offs between training and testing datasets, since in RL we typically care about task performance, and train on the test data, so to speak.

What is typically done is that the learning rate is set to decrease at a fixed schedule, and will eventually approach 0.

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store