You are right. This method requires that all episodes are of at least length 8. In my example, the episodes are always 50 steps long, so this won’t happen. Indeed, in most RL situations, episode length would be at least this long. If you are using a situation in which episodes are shorter, it may not make sense to use an RNN layer at all, since there likely aren’t the long term temporal dependencies that this layer is meant to capture in your environment to begin with, and doing something like stacking observations as input may be more appropriate.
Hope that clarifies things.