Thanks for taking the time to read the article! To answer your questions:
mcorresponds to each individual example, not each pixel.
- You are right that the first equation (for the discriminator) would be set to negative if we are minimizing it, but the second equation is actually left as-is when minimizing. In the code I use a modified version
-log(D(G(z))which is equivalent, but keeps the gradients for both networks in the same direction.
- Because of the nature of the generator equation being a gradient descent, not a gradient ascent, the losses are negative, so the intuition you described is correct, just reversed.
Hope that clears things up!