Hi Gabriel,

The traditional role of biases is to encourage generalization (and prevent overfitting) between the training set and overall dataset. In the case of RL, generalization isn’t actually something we are interested in. Instead we want our Policy and Value outputs to be as accurate as possible, which means having them being completely conditioned on the state input. Biases would introduce, well, a bias, which isn’t something we want when selecting actions.

Research Scientist. Interested in Artificial Intelligence, Neuroscience, Philosophy, and Literature.

