To answer question 1:
len(episode_buffer) == 30 actually correspond to two different things. The first corresponds to a time-out when the episode has gone on for too long, and the second corresponds to the rollout buffer for a given episode being full. Both are triggered at different times, and we want to capture both.
and question 2:
The specific discounting used in the code is related to Generalized Advantage Estimation, and I am taking the implementation from the OpenAI implementation of their universe-starter-agent. It is simple enough to replace this with the more basic advantage estimation that I describe in the article if you want to try.
Hope that helps.