That is a cool challenge to take on! One thing you want to be sure of in Model-based approaches is that the model is actually learning the dynamics of the environment accurately. If the model doesn’t accurately predict future states, or termination conditions, then using it for planning won’t be successful. The model-based approach I used for CartPole was pretty tailored to that task, so you may need to tweak a number of things to ensure the model is learning well. Good luck!