Hi Hanyoung,

We subtract the entropy here because in the previous line where it is calculated a negative sign is used in order to make it positive. In this way higher values mean more spread distribution, and lower values mean less spread. In the original paper they don’t reverse the sign when calculating entropy, so there is no need to un-reverse it in their overall loss equation.

Research Scientist. Interested in Artificial Intelligence, Neuroscience, Philosophy, and Literature.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store