Hi Madhavun,

The fake label is used to mask the gradient of the action we didn’t take. In this case, since we have only binary action possibilities, it will simply be the inverse of the actual action. In settings with multiple possible actions, we would mask multiple gradient paths using this fake label. Hopefully that is helpful.

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store