An introduction to generative adversarial networks with Tensorflow

In this tutorial you will learn about Generative Adversarial Networks or GANs and how they can be used to generate fake images that look like real ones. We will also give you a step-by-step guide for building a GAN and training it on celebrity faces using Tensorflow.

Generative Adversarial Networks (GANs) are network models invented by Ian Goodfellow during his PhD at the university of Montreal under the supervision of Yoshua Bengio and Aaron Courville. In a session held on Quora in July 2016, Yann LeCun, one of the fathers of Deep Learning, considered GANs as the most important breakthrough in Deep Learning at that time. The term 'generative' refers to the fact that these networks can learn to generate data samples that are similar to real ones in the training dataset. In other words, GANs can learn to approximate the distribution of samples in the dataset, and use that distribution to generate new samples. When data are images, this boils down to being able to create new images the network has never seen before in the dataset. The term 'adversarial' refers to the type of the optimization procedure used to train the network; successful training of a GAN requires reaching an equilibrium state between two opposing objectives, unlike CNNs or LSTMs where the training objective is to minimize or maximize the value of single cost function. As shown in the figure bellow, a GAN is typically composed of two sub-networks:

Diagram GAN

  • The Discriminator (D): The Discriminator is also a neural network. Its input is either a 'real' image from the dataset or a 'fake' one generated by the Generator network. Its output is a boolean variable which is basically a decision on whether the input image is real or fake.

  • The Generator (G): The Generator is a neural network that takes a randomly generated vector as input and generates an output image. This image is called the 'fake' image and has the same dimensions as a 'real' image taken from the dataset.

Since both D and G are neural networks, they need to be trained. However, they have different training objectives. Let's put it this way: D seeks the truth, if the input image is real, its training objective would be to output the 'real' label. If the input image is a 'fake' one generated by G, its training label would be 'fake'. G, on the other hand, seeks to cheat D by making it think that a fake image is a real one. This means that the training objective for G is to make D output the 'real' label each time a 'fake' image is presented.

This means that we have three types of training examples in a GAN:

  1. Type 1 is used to train D. It consists of (real image, real label) pairs.

  2. Type 2 is also used to train D. It consists of (fake image, fake label) pairs.

  3. Type 3 is used to train G while D parameters are fixed. It consists of (fake_image, real label) pairs.

In this tutorial we will implement a version of GAN called Deep Convolutional GAN (DCGAN) using Tensorflow. We will train our network on the (CelebA) dataset which contains over 200000 images of celebrity faces. Then we will use the Generator to generate new face images.

Preparing the Dataset

First, you need to download the CelebA dataset from here. The version we are going to use is the aligned and cropped one. This means that all images were cropped and face are centered so that they have roughly the same size and position in every image. This is useful to reduce the amount of variability among images and make training easier (See the figure bellow). The size of the zipped folder is around 1.4GB and each image has a size of 178x218 pixels with RGB channels. Go ahead and unzip the downloaded file and copy all images to a path of your choice.

Image examples

The Git repo I prepared will allow you to directly run the code I provide as a jupyter notebook, you can clone it by running the following command:

 

$ git clone https://github.com/ala-aboudib/tutorial_dcgan.git

 

You can also find a Conda environment file containing all the necessary packages you need to run the code.

The Discriminator (D)

The discriminator is a Convolutional Neural Network (CNN) as shown in the below figure. It takes an input image and decides whether it is real (1) or fake (0). Input images to D come either from the data set of real images, or from the output of the generator G as fake images. Here are the main operations needed to build D:

Image Discriminator

  • Convolution:: This is a linear operation that can be viewed as a pattern detector. Tensorflow provides the `tf.layers.conv2d()` to easily implement this operation.
  • Batch normalizatioin: In the original DCGAN paper, Batch Normalization (BN) was shown to give better results while making training easier. The easiest way to implement BN in Tensorflow is by using `tf.layers.batch_normalization()`.
  • LReLU: Leaky Rectified Linear Unit (LReLU) is a non-linear function that makes training easier by preventing vanishing gradients. An easy way to implementing an LReLU is by using the `tf.maximum()` function.
  • Dense transformation: This is a linear operation accomplished by means of a fully-connected layer which connects all input units to all output units. This is necessary in order to transform the flattened output of the last convolutional layer into a single value representing the output logit of the network. One way to apply this is by using the `tf.layers.dense()` function.
  • Sigmoid: This non-linear function is applied to the output logit of the network to obtain a value between 0 and 1 representing the final decision of the network. In Tensorflow we can use the `tf.nn.sigmoid()` function.

Before starting to implementing the discriminator, lest's import some packages that we will need later:

 

import tensorflow as tf
import os
import cv2
import numpy as np
import matplotlib.pyplot as plt

Now here is a Tensorflow implementation of the discriminator network shown above:

 

def discriminator(images, reuse = False, training = True):
    """
    Creates the discriminator neural network (D)
    
    parameters
    -----------
    :images: tensor or tensor placeholder (batch_size, 32, 32, 3)
             a batch of input images.
               
    :reuse: boolean
            should be set as True when network parameters should be reused, False otherwise.
            
    :training: boolean
               should be set as True during the training phase, False otherwise.
               
    returns
    -------
    :outputs: tensor (batch_size, 1)
              a batch of output logits.
    """

    # This is the leak parameter for the LReLU
    alpha = 0.1

    with tf.variable_scope('D', reuse = reuse):
        
        ## Layer 1
        ## Input:  images, with size (batch_size, 32, 32, 3)
        ## Output: relu1, with size (batch_size, 32, 32, 32)
        
        x1 = tf.layers.conv2d(inputs = images, 
                              filters = 32, kernel_size = 3, strides = 1, use_bias = True, padding = 'same')
        relu1 = tf.maximum(alpha * x1, x1)
        
        ## Layer 2
        ## Input:  relu1, with size (batch_size, 32, 32, 32)
        ## Output: relu2, with size (batch_size, 16, 16, 64)
        
        x2 = tf.layers.conv2d(inputs = relu1, 
                              filters = 64, kernel_size = 5, strides = 2, use_bias = False, padding = 'same')
        bn2 = tf.layers.batch_normalization(inputs = x2, training = training)
        relu2 = tf.maximum(alpha * bn2, bn2)
        
        ## Layer 3
        ## Input:  relu2, with size (batch_size, 16, 16, 64)
        ## Output: relu3, with size (batch_size, 8, 8, 128)
        
        x3 = tf.layers.conv2d(inputs = relu2, 
                              filters = 128, kernel_size = 5, strides = 2, use_bias = False, padding = 'same')
        bn3 = tf.layers.batch_normalization(inputs = x3, training = training)
        relu3 = tf.maximum(alpha * bn3, bn3)
        
        ## Layer 4
        ## Input:  relu3, with size (batch_size, 8, 8, 128)
        ## Output: relu4_flat, with size (batch_size, 4 * 4 * 256)
        
        x4 = tf.layers.conv2d(inputs = relu3, 
                              filters = 256, kernel_size = 5, strides = 2, use_bias = False, padding = 'same')
        bn4 = tf.layers.batch_normalization(inputs = x4, training = training)
        relu4 = tf.maximum(alpha * bn4, bn4)
        relu4_flat = tf.reshape(tensor = relu4, shape = (-1,  np.prod(relu4.get_shape().as_list()[1:])))

                
        ## Layer 5
        ## Input:  relu4_flat, with size (batch_size, 4 * 4 * 256)
        ## Output: outputs, with size (batch_size, 1)
        
        logits = tf.layers.dense(inputs = relu4_flat, units = 1) 
        outputs = tf.nn.sigmoid(logits)

        return logits, outputs

The Generator (G)

The Generator takes an input as a randomly generated low-dimensional vector, and produces an output image that has the same dimensions as real images in the dataset. This transformation is carried out throughout multiple layers. At each layer, one or more of the following operations are performed:

  • Pattern generation: The main operation used in a G to generate image patterns is deconvolution. While a convolution kernel is used to encode an input image patch into a single pixel of the output image, deconvolution can be used to do just the opposite; decode a single pixel from the input image into an output image patch or pattern. In Tensorflow, a similar operation can be applied using a transposed convolution by means of the `tf.layers.conv2d_transpose()` function.
  • Upsampling: In the same way as convolution can be used to downsample an image (or tensor) using a stride higher than 1 1 . Deconvolution can be used to upsample an image by also using a stride higher than 1 1 . Upsampling is needed since tensors have lower height/width earlier in the generator's pipeline and get a higher height/width in later layers.
  • Dense transformation: The linear transformation we need to apply in the generator is carried out by a dense (fully-connected) layer. This is necessary in order to transform the input random vector into a new vector with a size more adapted to the reshaping operation. One way to apply this is by using the `tf.layers.dense()` function.
  • LReLU The applied non-linearity is a Leaky Rectified Linear Unit (LReLU) which makes training easier. An easy way to implement an LReLU is by using the `tf.maximum()` function.
  • Batch normalizatioin.

So let's build a generator with the architecture as in the figure below:

Image Disciminator

 

def generator(z_, reuse = False, training = True):
    """
    Creates the generator neural network (G)
    
    parameters
    -----------
    :z_: tensor placeholder (batch_size, z_size)
        should be given a batch of randomly generated vectors.
               
    :reuse: boolean
            should be set as True when network parameters should be reused, False otherwise.
            
    :training: boolean
               should be set as True during the training phase, False otherwise.
               
    returns
    -------
    :fake_images: tensor (batch_size, 32, 32, 3)
                  a batch of the generated 'fake' images.
    """
    
    
    # This is the leak parameter for the LReLU
    alpha = 0.1
    
    # All generator parameters are defined inside the 'G' scope
    with tf.variable_scope('G', reuse = reuse):
        
        ## Layer 1
        ## Input:  z_, with size (batch_size, z_size)
        ## Output: relu1, with size (batch_size, 4, 4, 1024)
        x1 = tf.layers.dense(inputs = z_, 
                             units = 4 * 4 * 1024, use_bias = False) 
        x1 = tf.reshape(tensor = x1, shape = [-1, 4, 4, 1024]) # Reshape operation
        bn1 = tf.layers.batch_normalization(inputs = x1, training = training) # Batch normalization
        relu1 = tf.maximum(alpha * bn1, bn1) # Leaky ReLU
        
        ## Layer 2
        ## Input: relu1, with size (batch_size, 4, 4, 1024)
        ## Output: relu2, with size (batch_size, 8, 8, 512)
        
        # Transposed convolution, resulting size of x2 is (batch_size, 8, 8, 512)
        x2 = tf.layers.conv2d_transpose(inputs = relu1, 
                                        filters = 512, kernel_size = 3, strides = 2, padding = 'same', use_bias = False)
        
        bn2 = tf.layers.batch_normalization(inputs = x2, training = training)
        relu2 = tf.maximum(alpha * bn2, bn2)
        
        ## Layer 3
        ## Input: relu2, with size (batch_size, 8, 8, 512)
        ## Output: relu3, with size (batch_size, 16, 16, 256)
        
        x3 = tf.layers.conv2d_transpose(inputs = relu2, 
                                        filters = 265, kernel_size = 5, strides = 2, padding = 'same', use_bias = False)
        bn3 = tf.layers.batch_normalization(inputs = x3, training = training)
        relu3 = tf.maximum(alpha * bn3, bn3)
        
        ## Layer 4
        ## Input: relu3, with size (batch_size, 16, 16, 256)
        ## Output: relu4, with size (batch_size, 16, 16, 128)
        
        x4 = tf.layers.conv2d_transpose(inputs = relu3, 
                                        filters = 128, kernel_size = 5, strides = 1, padding = 'same', use_bias = False)
        bn4 = tf.layers.batch_normalization(inputs = x4, training = training)
        relu4 = tf.maximum(alpha * bn4, bn4)
        
        ## Layer 5
        ## Input: relu4, with size (batch_size, 16, 16, 128)
        ## Output: relu5, with size (batch_size, 32, 32, 64)
        
        x5 = tf.layers.conv2d_transpose(inputs = relu4, 
                                        filters = 64, kernel_size = 5, strides = 2, padding = 'same', use_bias = False)
        bn5 = tf.layers.batch_normalization(inputs = x5, training = training)
        relu5 = tf.maximum(alpha * bn5, bn5)
        
        ## Layer 6
        ## Input: relu5, with size (batch_size, 32, 32, 64)
        ## Output: fake_images, with size (batch_size, 32, 32, 3)
        
        x6 = tf.layers.conv2d_transpose(inputs = relu5, 
                                        filters = 3, kernel_size = 5, strides = 1, padding = 'same', use_bias = False)


        # Apply tanh activation function to get output values between -0.5 and +0.5
        fake_images = tf.nn.tanh(x6) * 0.5
        
        return fake_images

Notice that we do not apply batch normalization neither to the first layer of the discriminator nor to the last layer of the generator. This was suggested in the DCGAN paper for a better training stability. Moreover, the non-linear activation function used in G's output is a tanh rather than a LReLU which is required to keep pixel values in the generated image between -0.5 and +0.5 as in real images.

You might have wondered about how the transposed convolution parameters such as the kernel size, strides and padding should be fixed in order to obtain the required output height and width. One way to do that is by choosing these parameters to be the same as those of a convolution operation that, if applied to a tensor with the same height/width as the transposed convolution's output, would give a tensor that has the same height/width as the transposed convolution's input. If you wish to learn more about transposed convolution, you can check out this page.

Computing loss

In order to compute the discriminator's loss, we need to compare its output value, to its desired output value; for real images, D's desired output is 1 and for fake ones, it is 0. To be more precise, instead of using a value of 1 to represent 'real', we would smooth this value by mutliplying it by a number slightly lower that 1. This was found to make training faster in the DCGAN paper.

To compute the Generator's loss, we should compare D's output to G's desired output which is a smoothed 1. Remember that the goal of the Generator is to trick the Discriminator into believing that fake images are real.

Computing these losses is performed according to a few steps shown in comments in the code below:

def getLoss(z_, real_images_):
    """
    Creates the network graph and returns the discriminator's and the generator's losses.
    
    parameters
    -----------
    :z_: tensor placeholder (batch_size, z_size)
         should be given a batch of randomly generated vectors.
         
    :real_images_: tensor placeholder (batch_size, image_height, image_width, 3)
                   a batch of real images taken from the dataset.
                   
    
    returns
    -------
    
    :d_loss: tensor (batch_size, 1)
             the discriminator's loss.
             
    :g_loss: tensor (batch_size, 1)
             the generator's loss.
    """

    # This is the smoothing parameter
    smoothing = 0.9
    
    ## train_step 1: create and run the generator
    
    # get fake images from the generator
    fake_images = generator(z_ = z_, reuse = False, training = True)
    
    
    ## train_step 2: create and run the discriminator
    
    # get final outputs from the discriminator which are 'real' or 'fake' decisions
    d_output_real, d_logits_real = discriminator(images = real_images_, reuse = False, training = True)
    d_output_fake, d_logits_fake = discriminator(images = fake_images,  reuse = True, training = True)
    
    
    ## train_step 3: compute training losses for both D and G
    
    # create desired labels for real and fake images
    labels_real = tf.ones_like(d_logits_fake) * smoothing
    labels_fake = tf.zeros_like(d_logits_fake)
    
    # compute the generator's loss by comparing the discriminator's outputs to the
    # generator's desired outputs
    g_loss = tf.reduce_mean(
        tf.nn.sigmoid_cross_entropy_with_logits(logits = d_logits_fake, labels = labels_real))
    
    # Compute the discriminator's loss by comparing the discriminator's outputs to the
    # discriminators's desired outputs
    d_loss_real = tf.reduce_mean(
        tf.nn.sigmoid_cross_entropy_with_logits(logits = d_logits_real, labels = labels_real))
    
    d_loss_fake = tf.reduce_mean(
        tf.nn.sigmoid_cross_entropy_with_logits(logits = d_logits_fake, labels = labels_fake))
    
    # The total loss for the discriminator is the sum of losses on fake and real images
    d_loss = d_loss_real + d_loss_fake
    
    return d_loss, g_loss

Notice that the second time we call the discriminator() function we set the 'reuse' argument to True. This is important since we do not wish to create a new discriminator with new parameters. We need to reuse the already created discriminator in the first call.

Setting up the optimizers

Tensorflow provides a bunch of optimizers that can be used out-of-the-box. These optimizers automatically minimize the loss by computing gradients and updating the network trainable parameters. For our model we are going to use the Adam optimizer (tf.train.AdamOptimizer). For a comparision among different optimizers, you can go ahead and check out this great post.

Notice that parameters for D and G are not updated simultaneously. Each one of them has its own trainable parameters that need to be updated while the others' are fixed. Thus, we need two optimizers; one for D and another of G.

Since we are using batch normalization in both D and G, we need to add the line

with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):

before implementing the optimizers. This tells Tensorflow that updating the moving averages for means and variances necessary for batch normalization should take place before executing the training step. Here is a nice article that explains the idea behind control dependencies.

def getOptimizers(d_loss, g_loss, learning_rate, beta1):
    """
    Creates optimizer objects for both of the discriminator and the generator.
    
    parameters
    -----------
    :d_loss: tensor (batch_size, 1)
             the discriminator's loss.
             
    :g_loss: tensor (batch_size, 1)
             the generator's loss.
    
    :learning_rate: float
                    the learning rate of the optimizers.
                    
    :beta1: float
            the value of the beta1 parameter of the Adam optimizers.
            
    returns
    -------
    :d_opt: an Adam optimizer object for the discriminator.
             
    :g_opt: an Adam optimizer object for the generator.
    """

    # get the list of trainable parameters of the network.
    # trainable names defined inside the 'G' variable scope start with the string 'G'
    # trainable names defined inside the 'D' variable scope start with the string 'D'
    trainables = tf.trainable_variables()
 
    # separate trainable parameters of the discriminator from those of the generator
    d_vars = [var for var in trainables if var.name.startswith('D')]
    g_vars = [var for var in trainables if var.name.startswith('G')]

    # create two optimizers. For each one, inject the corresponding loss
    # to be minimized and indicate which network parameters to tune
    with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
        
        d_opt = tf.train.AdamOptimizer(learning_rate = learning_rate, 
                                       beta1 = beta1).minimize(d_loss, var_list = d_vars)
        
        g_opt = tf.train.AdamOptimizer(learning_rate = learning_rate, 
                                       beta1 = beta1).minimize(g_loss, var_list = g_vars)
    
    return d_opt, g_opt

Notice that the second time we call the discriminator() function we set the 'reuse' argument to True. This is important since we do not wish to create a new discriminator with new parameters. We need to reuse the already created discriminator in the first call.

Setting up the optimizers

Tensorflow provides a bunch of optimizers that can be used out-of-the-box. These optimizers automatically minimize the loss by computing gradients and updating the network trainable parameters. For our model we are going to use the Adam optimizer (tf.train.AdamOptimizer). For a comparision among different optimizers, you can go ahead and check out this great post.

Notice that parameters for D and G are not updated simultaneously. Each one of them has its own trainable parameters that need to be updated while the others' are fixed. Thus, we need two optimizers; one for D and another of G.

Since we are using batch normalization in both D and G, we need to add the line

with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):

before implementing the optimizers. This tells Tensorflow that updating the moving averages for means and variances necessary for batch normalization should take place before executing the training step. Here is a nice article that explains the idea behind control dependencies.

def getOptimizers(d_loss, g_loss, learning_rate, beta1):
    """
    Creates optimizer objects for both of the discriminator and the generator.
    
    parameters
    -----------
    :d_loss: tensor (batch_size, 1)
             the discriminator's loss.
             
    :g_loss: tensor (batch_size, 1)
             the generator's loss.
    
    :learning_rate: float
                    the learning rate of the optimizers.
                    
    :beta1: float
            the value of the beta1 parameter of the Adam optimizers.
            
    returns
    -------
    :d_opt: an Adam optimizer object for the discriminator.
             
    :g_opt: an Adam optimizer object for the generator.
    """

    # get the list of trainable parameters of the network.
    # trainable names defined inside the 'G' variable scope start with the string 'G'
    # trainable names defined inside the 'D' variable scope start with the string 'D'
    trainables = tf.trainable_variables()
 
    # separate trainable parameters of the discriminator from those of the generator
    d_vars = [var for var in trainables if var.name.startswith('D')]
    g_vars = [var for var in trainables if var.name.startswith('G')]

    # create two optimizers. For each one, inject the corresponding loss
    # to be minimized and indicate which network parameters to tune
    with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
        
        d_opt = tf.train.AdamOptimizer(learning_rate = learning_rate, 
                                       beta1 = beta1).minimize(d_loss, var_list = d_vars)
        
        g_opt = tf.train.AdamOptimizer(learning_rate = learning_rate, 
                                       beta1 = beta1).minimize(g_loss, var_list = g_vars)
    
    return d_opt, g_opt

Some helper functions

Let's now create a class called Helper that contains some utility methods. The main methods we are going to use are getBatch which is a python generator function that returns one batch of dataset images at a time, and showGeneratorOutput() which uses the GAN's G network to generate a set of random images and plot them.

Before calling getBatch(), the configDataset() method should be called to determine the path to the dataset, the batch size, the height and width to which images should be resized.

The getBatch() method uses the getImage() methos which applies some processing to each image as follows:

  1. Crops a tight zone of 120x120 pixels around each face. This removes most of the background and make it easier to train the network since there will be no need to generate background pixels.

  2. Resizes each image to (image_height x image_width) pixels.

  3. Normalizes pixel values in each of the R, G and B channels to the continuous interval [-0.5, 0.5].

Here's how cropped and resized images will look like:

Image: photos

and here is the code for the helper class:

class Helper:
    
    def configDataset(self, data_path, batch_size, image_height, image_width):
        """ 
        sets some instance variables to handle the dataset and batches.
        
        parameters
        ----------
        :data_path: string
                    the file path to the dataset.
                    
        :batch_size: integer
                     the size of the image batch to return from the dataset.
                     
        
        :image_height: integer
        :image_width: integer
                      the image height/width to which each image should be resized.
                      
        returns
        -------
        None
        """
        self.data_path = data_path
        self.batch_size = batch_size
        self.image_height = image_height
        self.image_width = image_width    

    def getImage(self, image_path):
        """
        opens and preprocesses an image before adding it to the batch
        
        parameters
        ----------
        :image_path: string
                     the file path to the image.
                     
        returns
        -------
        :image: image (3D numpy array)
                the preprocessed image.
        """
        # load the image
        image = cv2.imread(image_path)
            
        # convert the image to RGB. This is because opencv load images in BGR by default
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            


        # crop the image to a little square area just around the face. 
        # This is to make training faster and easier
        area_size = 120
        x = (image.shape[0] - area_size ) // 2
        y = (image.shape[1] - area_size ) // 2
        image = image [x : x + area_size, y : y + area_size]


        # resize the image
        image = cv2.resize(image, (self.image_width, self.image_height))
        #plt.imshow(image)
        #plt.show()

        # normalize pixel values to the range [-0.5, 0.5]
        image = (image / 255) - 0.5

        return image

    def getBatch(self):
        """
        a python generator function that return a batch of images each time it is called.
        """

        # get a list of all image names in the dataset
        all_names = os.listdir(self.data_path)

        # index of the an image in image_names
        index = 0

        while index < len(all_names):

            # get a list of image name to put in the batch
            batch_names = all_names[index : index + self.batch_size]


            # create the batch: a numpy array containing images
            image_batch = np.array(
                [self.getImage(os.path.join(self.data_path, name)) for name in batch_names ]).astype(np.float32)


            # update the image index
            index += self.batch_size


            yield image_batch
            
    def createImageGrid(self, image_batch, image_per_side):
        """
        creates a single image containing all batch images organized as a square grid.
        
        parameters
        ----------
        :image_batch: numpy array (number of images, image height, image width, 3)
                      a batch of images.
            
        :image_per_side: integer
                         the number of images per each side of the square grid.
                         
        returns
        -------
        
        :canvas: RGB image (3D numpy array)
                 the square grid including all batch images.
        """
        # get height and width of individual images
        image_height = image_batch.shape[1]
        image_width = image_batch.shape[2]
        
        # create the convas that will include the image grid
        canvas = np.zeros(( image_per_side * image_height, image_per_side * image_width, 3), dtype = np.float32)

        # this are row and column indexes for the canvas array
        x = 0
        y = 0
        
        # populate the convas with images from the batch
        for i in range(image_per_side ** 2):

            canvas[x : x + image_height, y : y + image_width] = image_batch[i]
            
            y += image_width
          
            if y >= image_per_side * image_width:

                y = 0
                x += image_height
   
        # since individual images has values in [-0.5, 0.5]. Rectify this to [0,1]
        canvas = (canvas + 0.5)

        return canvas
        
    def showGeneratorOutput(self, sess, n_images, z_):
        """
        uses the GAN's generator to randomly generate some sample images and plot them 
        as a square grid of images.
        
        parameters
        ----------
        :sess: a Tensorflow session object.
        
        :n_images: integer
                   number of images to generate.
        
        :z_: tensor placeholder (batch_size, z_size)
             should be given a batch of randomly generated vectors.
        
        """
        # get the size of each random vector.
        z_size = z_.get_shape().as_list()[-1]
        
        # generate a batch of random vectors as inputs to the generator
        z = np.random.uniform(-1, 1, size=[n_images, z_size])

        # get a batch of fake images from the generator
        image_batch = sess.run(generator(z_, reuse = True, training = False),
                               feed_dict={z_: z})

        # organize the generated images in a square grid
        images_grid = self.createImageGrid(image_batch, int(np.sqrt(n_images)))
        
        # plot the image grid
        plt.imshow(images_grid)
        plt.show()

Setting up the training procedure

Setting up the training procedure is simple. We first need to create the network and get the losses using our getLoss() function. Then we use those losses to create the optimizers using the getOptimizers() function we defined earlier. Then we run these optimizers in each training step. Here is how to do that in more detail:

 

def train(n_epochs, z_size, learning_rate, beta1, helper):
    """
    Code for training the network.
    
    parameters
    -----------
    :n_epochs: integer
               the number of training epochs
         
    :z_size: integer
             the size of a random vector (G's input)
             
    :learning_rate: float
                    the learning rate of the optimizer
                    
    :beta1: float
            the value of the beta1 parameter of the Adam optimizer
        
    :helper: an object the Helper class
                   
    
    returns
    -------
    None
    
    """
    

    # Create a placeholder for real image batches
    real_images_ = tf.placeholder(dtype = tf.float32, 
                                  shape = (None, image_height, image_width, 3),
                                  name = 'real_images')
    
    # Create a placeholder for random vector batches (G's input)
    z_ = tf.placeholder(dtype = tf.float32, 
                        shape = (None, z_size),
                        name = 'z')

    
    # create the GAN network and get the losses
    d_loss, g_loss = getLoss(z_, real_images_)
    
    # create the optimizers by injecting the losses and other hyperparameters
    d_opt, g_opt = getOptimizers(d_loss, g_loss, learning_rate, beta1)
    
    
    with tf.Session() as sess:
        
        # initialize all 'tf.Variable' objects
        sess.run(tf.global_variables_initializer())
        
        # start going through epochs
        for ep in range(n_epochs):
            
            # a counter for training steps
            train_step = 0
            
            # get a batch of real images from the dataset
            for real_images in helper.getBatch():

                train_step +=1
                
                # create a batch of random vectors 'z' as inputs to G
                z = np.random.uniform(low = -1, high = 1,size = [real_images.shape[0], z_size])
               
                # run an optimization step (training step) for each of G and D
                sess.run(d_opt, feed_dict = {z_ : z, real_images_ : real_images})
                sess.run(g_opt, feed_dict = {z_ : z, real_images_ : real_images})
                
                # this is just for visualization of training loss at the current training step
                if train_step % 25 == 0:
                    d_loss_train = d_loss.eval({z_: z, real_images_: real_images})
                    g_loss_train = g_loss.eval({z_: z, real_images_: real_images})
                    
                    print("Epoch {}/{}".format(ep + 1, n_epochs),
                          "D loss {:.8f} ...".format(d_loss_train),
                          "G loss {:.8f} ...".format(g_loss_train))
                    
                # calls a helper method that uses the generator to randomly generate some sample image
                # this is useful to know whether the network has arrived to a satisfactoy result
                if train_step % 100 == 0:
                    helper.showGeneratorOutput(sess, 100, z_)

Training the GAN

First we need to choose values for some hyperparameters, here are some values that worked for me:

 

# the path to the folder containing face images
# change this path to point to the one you chose on your machine
data_path = 'data/celeba/'

# number of epochs
n_epochs = 6

# the batch size used for training
batch_size = 64

# height and width to which images should be resized
image_height = 32
image_width = 32

# parameters for the Adam optimizer
learning_rate = 0.00002
beta1 = 0.5

# size of random vectors used as input to the generators
z_size = 100

And finally we call the train() function we wrote earlier to launch the training procedure:

In [ ]:

# create a helper object that includes utility methods to get batches and 
# plot image grids
helper = Helper()
helper.configDataset(data_path, batch_size, image_height, image_width)


with tf.Graph().as_default():

    # start training ...
    train(n_epochs, z_size, learning_rate, beta1, helper)

Running the above code should print out the losses for each of the discriminator and the generator every 25 training steps. Since we have two optimizers, it would be difficult to tell when to stop training just by looking at the losses. This is the reason why we generate a sample of fake images each 100 training steps. This serves as a visual indicator that training has reached a satisfactory result.

Notice that training a GAN is harder than training a single CNN, losses usually fluctuate a lot and the visual quality of generated images might deteriorate after reaching the desired quality. Here is an example of how the generated images evolve throughout different epochs. Notice that after obtaining a good outcome in epoch 3 the quality starts the get worse.

Image result

Conclusion

This was a basic way in which GANs can be implemented to generate images. Although the generated images clearly resemble faces as shown in the bellow figure, this is not a completely satisfactory result. If we wish to generate higher quality images, we could simply increase the size of training images. This does not actually work in practice, because it makes training harder and less stable and artifacts in the resulting images would be easier to notice due to the increased size.

Image conparison of real and fake image

There are multiple methods proposed to generate higher quality images with higher resolutions. One example is StackGAN which uses a chain of GAN networks, each refines the output of the previous one. Another interesting approach is the Progressive Growing GAN where layers are progressively added to both the generator and the discriminator as the resolution of training images increases.

If you wish to learn more about GANs and their different 'flavors', I recommend you check out this NIPS 2016 tutorial by Ian Goodfellow which reviews different variants of GANs and compare them to other generative models along with some training tips and tricks.

 

Do you have any comments or further information? You want to read more about generative adversarial networks?

Just log in or register and leave a comment here.

Want to read on?

Find other interesting ... Show more