The Beginner’s RL Playground: A Simple Interactive Website for Grokking Reinforcement Learning
Almost a decade ago, I spent a year writing a series of articles teaching the basics of Reinforcement Learning (RL). What was exciting to me then (and still is now) is that RL enables agents to learn from experience interacting in an environment, not unlike how humans and other animals learn. At the time, the field of RL was still relatively new, or at least very few people had any hands-on experience with it. Technical advances at the time, such as Deep Q-Networks and AlphaGo, brought a lot of attention to it however. With these newer methods, it seemed that RL was finally able to be used at scale to allow artificial agents to learn to play real games that even humans found challenging such as Space Invaders or Go.
In the years since then, there has been a tremendous amount of research dedicated to the field of RL, with thousands of papers published in that timespan. It currently serves as the basis for many of the techniques by which Large Language Models (LLMs) are fine-tuned to become better aligned and more intelligent assistants. Throughout this time, however, I have remained interested in how best to teach the basics of RL to people who may be completely new to the field. Back in 2016, I taught people using Python, Jupyter Notebooks, and TensorFlow. Although the latter is now hopelessly outdated, the former remains a strong programming language for anyone looking to learn the fundamentals of any AI sub-field. That said, the complexities of setting up a Python environment still make it difficult for some people to get started if they have little or no programming background.
For a long time, I have wanted to create an even easier way for people to see and understand first-hand the basic concepts behind RL. Towards that end, I spent the past week vibe-coding an interactive website which would have all the features I wished I had access to when learning the basics years ago. I call this new tool “The Beginner’s RL Playground,” and it allows people to run simple RL simulations themselves directly in their browser using JavaScript.
You can Try the Live Demo Here!
The core is an interactive grid world representing a simple Markov Decision Process (MDP). You can configure the grid size (from 2x2 up to 10x10) and click to place elements like Gems (💎 rewards), Hazards (☠️ negative rewards), and Walls (🚧 impassable barriers). You can even change the agent’s (🤖) starting position.
The playground includes implementations of six basic tabular RL algorithms:
- Q-Learning (Off-Policy TD Learning)
- SARSA (On-Policy TD Learning)
- Expected SARSA (On-Policy TD Learning)
- Monte Carlo Control (Learning from full episodes)
- Actor-Critic (Advantage Actor-Critic)
- Successor Representation (SR)
It also demonstrates different Exploration Strategies, like ε-Greedy and Softmax (Boltzmann), illustrating the crucial Exploration vs. Exploitation trade-off. Users can adjust key hyperparameters like the Learning Rate (α), Discount Factor (γ), and exploration parameters (like ε or Softmax Temperature β) to see their effects directly in real time during the simulation.
Most importantly, I did my best to make the core concepts visually apparent. You can watch the agent learn in real-time within the grid world you design. The tool visualizes the learned Value Function (V(s)) and Policy (π(a|s)) directly on the grid, shows the Action Values (Q(s,a)) for the agent’s current state, and even plots the agent’s Episodic Reward over time. You can move obstacles and rewards around in real time and see how the agent does (or doesn’t) adapt. I think this can help people build intuition about how these algorithms work — seeing concepts like Temporal Difference (TD) learning or components of the Bellman equation in action — and understand their limitations more easily than simply running a Python script and staring at numbers scroll down a terminal. The bottom of the web page also contains a simple english explanation of what each algorithm is doing for those that want something a little more formal.
I want this tool to be as widely accessible as possible, so alongside the hosted live demo, I am also providing the complete open source code for you to run locally and modify as desired:
The Beginner’s RL Playground (GitHub Repository)
If anyone wants to request features or discovers a bug, please feel free to open an issue on GitHub, and I’d be happy to take a look. As I said, this was vibe-coded in less than two weeks (thanks to Google Gemini 2.5 Pro and Cursor IDE!), so any modifications or additions will likewise be relatively straightforward. For example, I can think of adding model-based or replay-based algorithms as a next step. If anyone has an idea for how to make it an even more useful educational tool, I would love to discuss it with you. Let’s work together to make RL as accessible as possible.