Hi T SHAO,
This is the policy gradient loss calculation. It is related to cross-entropy in the sense that we want to increase the log-likelihood of rewarding actions and decrease it for un-rewarding actions.
Hi T SHAO,
This is the policy gradient loss calculation. It is related to cross-entropy in the sense that we want to increase the log-likelihood of rewarding actions and decrease it for un-rewarding actions.
PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.
PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.