Maximum Entropy Policies in Reinforcement Learning & Everyday Life

Entropy in Reinforcement Learning

Categorical (left), and Gaussian (Right) distributions. Orange shows low-entropy distributions, while blue shows high-entropy distributions.
Equation for entropy of a discrete probability distribution (p).

Encouraging Entropy

Update equation for A3C. Entropy bonus is H(π) term.

Maximizing for Long-term Entropy

Equation for Maximum Entropy Reinforcement Learning. Optimal policy π corresponds to maximum over both discounted rewards and entropy.
Results from experiments comparing one-step entropy bonus (red) to long-term optimization of entropy (blue). In the six tasks compared, the long-term entropy optimization leads to as good or better performance than the naive one-step entropy optimization. Taken from https://arxiv.org/abs/1704.06440.

Maximum Entropy Policies in Everyday Life

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store