Hi Ibrahim,

Good point! I think I tried it both ways, and splitting it seemed to work well enough for me, so I went with that. I would be interested to see if there is a major performance difference between the two approaches. It may end up depending on the complexity of the problem.

If it turns out that not splitting the layer activation works better, then I would be happy to edit this article to reflect that.

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

