Good question. This is because each batch comes from a continuous stream of experiences, rather than a random one. As such, we would like the RNN to process it as a single example with a length of
batch_size rather than
batch_size separate examples of length 1. In this way the RNN unrolls itself properly to learn from the temporal dependencies in the data.
Hope that clears things up.