Hi Akhan,

Generally speaking RL methods see little improvement when bias terms are used. They are typically used in situations in which generalization is beneficial, but in control settings, specifically deterministic ones, we don’t need to “generalize” to anything but the training environment.

In this case with a simple one-layer network they would actually serve to cause problems, since our weights are standing in as a direct way of measuring the Q values, and a bias term would push those values around in an inaccurate way.

Hope that helps!

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store