Hi Javier,

Thanks for taking the time to read the article! To answer your questions:

  1. Yep. m corresponds to each individual example, not each pixel.
  2. You are right that the first equation (for the discriminator) would be set to negative if we are minimizing it, but the second equation is actually left as-is when minimizing. In the code I use a modified version -log(D(G(z)) which is equivalent, but keeps the gradients for both networks in the same direction.
  3. Because of the nature of the generator equation being a gradient descent, not a gradient ascent, the losses are negative, so the intuition you described is correct, just reversed.

Hope that clears things up!

PhD. Interests include Deep (Reinforcement) Learning, Computational Neuroscience, and Phenomenology.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store