Hi Sean,

Good question. I haven’t tried it myself, but I would assume that it takes exponentially longer the more arms available. If you are using rewards of 1.0 for a single arm, and 0.0 for all other arms, the reward becomes increasingly sparse, and less meaningful signal is being provided to the network per episode. It could be that having independent arms means a more rich signal, since more than one arm could provide a reward each episode. Thus allowing for faster learning in that experiment.

Just some thoughts.

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store