# The present in terms of the future: Successor representations in Reinforcement learning

## Background — Value estimates

Almost all reinforcement learning algorithms are concerned, in one way or another, with the task of estimating value. At the simplest level this means learning how good it is to be in the current state — i.e. V(s), or how good is it to take any of the actions available in the current state — i.e. Q(s, a). It is with this latter piece of information that informed decisions can be made so as to optimize future reward. These two functions, V and Q, serve as the backbone of the majority of contemporary algorithms, ranging from DQN and SAC to PPO and many more.

## Basics of the Successor Representation

The value estimate for a given state corresponds to the expected temporally-discounted return which an agent is expected to receive over time starting from that state.

## The Successor Representation and Hierarchical RL

Once we have learned a successor representation, it can be analyzed to discover things about the nature of the environment. Going back again to the four-room environment example, once we have learned a successor representation, we can perform dimensionality reduction on the learned representation of each state. This allows us to visualize the representations in a human interpretable way, even though they are inherently high-dimensional. If we plot them on a 2D plane (see figure below), we can make a couple observations. The first is that all of the states in each of the four rooms are represented as being very similar to all other states in that room. If we think about what the successor representation captures, this is to be expected, as each of the states in a given room is most likely to lead to another state in the room following any given policy.

## The Successor Representation and Deep Learning

So far, I have only described learning successor representations in the tabular domain. There is in fact plenty of work that has been done to extend the successor framework to the world of deep learning and neural networks as well. The key insight to make this happen is that instead of R(s) and M(s,s’) being a vector and matrix, they can become the high-dimensional outputs of neural networks. Likewise, rather than being a simple one-hot index, the state can be represented as any vector of real numbers. Because of this property, successors in the deep learning framework are referred to as successor features, a phrase proposed by Andre Barreto and colleagues.

## The Successor Representation in Psychology and Neuroscience

It isn’t just in the domain of machine learning that the successor representation has gained traction as a paradigm of choice, it has also caught the eye of a number of psychologists and neuroscientists as well.

## Epilogue

All of this research is still in the early phases, but it is encouraging to see that a useful and insightful computation principle from over twenty years ago is being so fruitfully applied in both the domains of deep learning and neuroscience. As you may have guessed by this point, it is also something that has been at the center of my own research this past year, and I hope to be able to share the fruit of some of that work at some point in the not-too-distant future.

--

--

## More from Arthur Juliani

Research Scientist. Interested in Artificial Intelligence, Neuroscience, Philosophy, and Literature.

Love podcasts or audiobooks? Learn on the go with our new app.

## Arthur Juliani

Research Scientist. Interested in Artificial Intelligence, Neuroscience, Philosophy, and Literature.