The present in terms of the future: Successor representations in Reinforcement learning

Background — Value estimates

Almost all reinforcement learning algorithms are concerned, in one way or another, with the task of estimating value. At the simplest level this means learning how good it is to be in the current state — i.e. V(s), or how good is it to take any of the actions available in the current state — i.e. Q(s, a). It is with this latter piece of information that informed decisions can be made so as to optimize future reward. These two functions, V and Q, serve as the backbone of the majority of contemporary algorithms, ranging from DQN and SAC to PPO and many more.

Basics of the Successor Representation

The value estimate for a given state corresponds to the expected temporally-discounted return which an agent is expected to receive over time starting from that state.

An Interactive Example

Simple four-room gridworld environment. Red corresponds to the agent position. Blue corresponds to the walls. Green corresponds to the goal position.
Successor representations of nine states from four-room environment. Each state is represented in terms of future expected states.
Agent state occupancy during test-time after being exposed to both reward locations. Agent learns two separate paths to each of the goals.

The Successor Representation and Hierarchical RL

Once we have learned a successor representation, it can be analyzed to discover things about the nature of the environment. Going back again to the four-room environment example, once we have learned a successor representation, we can perform dimensionality reduction on the learned representation of each state. This allows us to visualize the representations in a human interpretable way, even though they are inherently high-dimensional. If we plot them on a 2D plane (see figure below), we can make a couple observations. The first is that all of the states in each of the four rooms are represented as being very similar to all other states in that room. If we think about what the successor representation captures, this is to be expected, as each of the states in a given room is most likely to lead to another state in the room following any given policy.

All state’s successor representations from four-room environment plotted on 2D grid. Bottleneck states marked in orange. All other states are marked in purple.

The Successor Representation and Deep Learning

So far, I have only described learning successor representations in the tabular domain. There is in fact plenty of work that has been done to extend the successor framework to the world of deep learning and neural networks as well. The key insight to make this happen is that instead of R(s) and M(s,s’) being a vector and matrix, they can become the high-dimensional outputs of neural networks. Likewise, rather than being a simple one-hot index, the state can be represented as any vector of real numbers. Because of this property, successors in the deep learning framework are referred to as successor features, a phrase proposed by Andre Barreto and colleagues.

Example of a neural network architecture utilizing successor features. Specifically, this is the Universal Successor Feature Approximator agent.

The Successor Representation in Psychology and Neuroscience

It isn’t just in the domain of machine learning that the successor representation has gained traction as a paradigm of choice, it has also caught the eye of a number of psychologists and neuroscientists as well.

Firing activity of a place cell in a rodent as it navigates around a circular space. B and C show effects of moving an indicator on the wall of the space. The place cell firing is anchored by the position of the indicator. Reproduced from Muller & Kubie 1987.
Grid cell activity of rodent as it navigates environments of different shapes. Reproduced from Stachenfeld et al., 2017.
Eigendecomposition of successor representation of agent that navigated environments of different shapes. Reproduced from Stachenfeld et al., 2017.

Epilogue

All of this research is still in the early phases, but it is encouraging to see that a useful and insightful computation principle from over twenty years ago is being so fruitfully applied in both the domains of deep learning and neuroscience. As you may have guessed by this point, it is also something that has been at the center of my own research this past year, and I hope to be able to share the fruit of some of that work at some point in the not-too-distant future.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Arthur Juliani

Arthur Juliani

Research Scientist. Interested in Artificial Intelligence, Neuroscience, Philosophy, and Literature.