Generative Adversarial Networks Explained with a Classic Spongebob Squarepants Episode

Plus a Tensorflow tutorial for implementing your own GAN

Arthur Juliani
6 min readSep 23, 2016

--

Generative Adversarial Networks (GANs) are a class of neural networks which have gained popularity in the past couple years, and for good reason. Put most simply, they allow a network to learn to generate data with the same internal structure as other data. If that description sounds a little general, that is because GANs are powerful and flexible tools. To make things a little more concrete, one of the more common applications of GANs is image generation. Say you have a bunch of images, such as pictures of cats. A GAN can learn to generate pictures of cats like those real ones used for training, but not actually replicate any one of the individual images. If given enough cat pictures, it actually learns about “cat-ness” from the samples, and learns to generate images that meet this standard. Furthermore, it does so without the generator actually having direct access to any of the training images itself. GANs have been applied to more than just images though. Recently they have been used to generate everything from short videos to robot behavior.

Left: real images. Right: images generated by GAN. Taken from OpenAI post on generative networks: https://openai.com/blog/generative-models/

In this tutorial, I am going to walk through the theory behind the GAN algorithm in a somewhat unconventional way. I am going to explain it using an episode of Spongebob Squarepants. Stick with me, the metaphor works much better than you would think! After that, I will show how to implement a GAN for image generation using Tensorflow and Python.

No Weenies Allowed

A Generative Adversarial Network works through the interplay between two semi-separate networks: a generator and a discriminator. The goal of the discriminator is to tell the difference between the data generated by the generator and the real-world data we are trying to model. We can think of the discriminator as being like the bouncer at a club. I don’t have just any bouncer in mind though. Discriminator networks are like the bouncer outside the Salty Spitoon, as seen in the Spongebob Squarepants episode: No Weenies Allowed.

In the world of Spongebob Squarepants, one must be tough in order to get into the Salty Spitoon. The job of the bouncer (discriminator) is to tell the difference between the real tough guys (the real data), and the weenies (the generated data).

Then we have Spongebob Squarepants (the generator). He certainly isn’t a real tough guy. He is an imitator, and needs to learn to look like a tough guy in order to get into to see his friend.

Through training, the discriminator network learns a function to tell the difference between the real and generated data. The first features the discriminator learns to look for may be relatively obvious aspects of the data which easily separate the real from the fake.

In Spongebob’s world, the bouncer outside the Salty Spitoon learns to look for a show of brute strength to tell the tough guys from the weenies.

Any real data should be able to pass this test easily.

Once the discriminator has learned something about what to use to tell the two apart, the generator can take advantage of what the discriminator has learned in order to learn for itself.

Spongebob can use the fact that the bouncer is checking for strength in order to appear more like an actual tough guy. At the beginning of training however, it still may not appear convincing.

If the generator hasn’t learned a convincing enough way to generate data, the discriminator will still be able to tell the real from the fake.

While Spongebob is able to demonstrate some features of toughness, he is ultimately called out as a weenie, and sent to Weenie Hut Jr for further training.

This process doesn’t end after a single round however. There can be thousands (or millions) of iterations before the generator produces data similar to the real thing. As the process continues, the discriminator and generator are trained in an alternating fashion. Over time the discriminator will begin to learn more subtle ways of distinguishing the real data from the generated data. This is exactly what we want to happen, as it makes the generator better too, since the generator is only as good at making up data as the discriminator is at telling the real thing apart from the imitation.

Checking back in on Spongebob, the bouncer has learned a new feature to use for distinguishing the tough from the weenies: fighting.

Our generator, Spongebob, uses this new information, and decides to pick a fight with someone in order to appear tough himself.

…and although things don’t go exactly a planned (see episode for more details), it works well enough to fool the bouncer into believing that he is a tough guy.

Putting aside the question of whether or not Spongebob is still a weenie at heart, he has learned to imitate the true tough guys enough to be let into the Salty Spitoon. With enough training, we hope our generator can eventually do the same, and generate data samples that are not only indistinguishable from the real thing to the discriminator, but also to us humans, who are often much more discerning.

The Math & Code

Our Spongebob metaphor only goes so far in helping actually build a GAN. To actually implement one, we need to get a little more formal. The generator (G) and discriminator (D) are both feedforward neural networks which play a min-max game between one another. The generator takes as input a vector of random numbers (z), and transforms it into the form of the data we are interested in imitating. The discriminator takes as input a set of data, either real (x) or generated (G(z)), and produces a probability of that data being real (P(x)).

The discriminator is optimized in order to increase the likelihood of giving a high probability to the real data and a low probability to the generated data.

The gradient ascent expression for the discriminator. The first term corresponds to optimizing the probability that the real data (x) is rated highly. The second term corresponds to optimizing the probability that the generated data G(z) is rated poorly. Notice we apply the gradient to the discriminator, not the generator.

The generator is then optimized in order to increase the probability of the generated data being rated highly.

The gradient descent expression for the generator. The term corresponds to optimizing the probability that the generated data G(z) is rated highly. Notice we apply the gradient to the generator network, not the discriminator.

By alternating gradient optimization between the two networks using these expressions on new batches of real and generated data each time, the GAN will slowly converge to producing data that is as realistic as the network is capable of modeling. If you are interested, you can read the original paper introducing GANs here for more information.

As mentioned in the introduction, the most popular application of GANs right now is for image generation using convolutional neural networks. Below is an example of an image-producing GAN. The generator which takes a vector input (z), and produces a 64x64x3 color image. The discriminator then takes both real images (x) and generated images (G(z)), and produces a probability P(x) for them. Once the network is trained and we would like to generate new images from it, we simply call G(z) on a new batch of randomized z vectors.

An example architecture for generator and discriminator networks. Both utilize convolutional layers to process visual information. Modified from http://arxiv.org/abs/1511.06434

This kind of network is called a DCGAN (deep convolutional generative adversarial network). Below is a walkthrough implementation of the algorithm in Tensorflow. In the tutorial I train it using the the MNIST dataset. Since we are only learning black and white digit images, our generator produces 32x32x1 images. With this you will have everything you need to train your own GANs, and the extension to color images requires only changing the output (G) and input (D) channels from 1 to 3.

I hope this tutorial was informative and entertaining for those new to GANs!

If this post has been valuable to you, please consider donating to help support future tutorials, articles, and implementations. Any contribution is greatly appreciated!

If you’d like to follow my work on Deep Learning, AI, and Cognitive Science, follow me on Medium @, or on twitter @awjliani.

--

--

Arthur Juliani

Interested in artificial intelligence, neuroscience, philosophy, psychedelics, and meditation. http://arthurjuliani.com/