Maximum Entropy Policies in Reinforcement Learning & Everyday Life

Entropy in Reinforcement Learning

Categorical (left), and Gaussian (Right) distributions. Orange shows low-entropy distributions, while blue shows high-entropy distributions.
Equation for entropy of a discrete probability distribution (p).

Encouraging Entropy

Update equation for A3C. Entropy bonus is H(π) term.

Maximizing for Long-term Entropy

Equation for Maximum Entropy Reinforcement Learning. Optimal policy π corresponds to maximum over both discounted rewards and entropy.
Results from experiments comparing one-step entropy bonus (red) to long-term optimization of entropy (blue). In the six tasks compared, the long-term entropy optimization leads to as good or better performance than the naive one-step entropy optimization. Taken from https://arxiv.org/abs/1704.06440.

Maximum Entropy Policies in Everyday Life

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Arthur Juliani

Arthur Juliani

Research Scientist. Interested in Artificial Intelligence, Neuroscience, Philosophy, and Literature.