The GANs coded to-date generate a set of random images. We have no control over what those images should be like. For example, the hair or eye colour in the anime images. Would be nice if we could control what images get generated.

Consider a GAN that generates images of men with or without facial hair. Be that only a moustache or a beard and moustache. Perhaps we only want images with a moustache. As things stand with the models coded so far, we would need to generate a bunch of random images and then only save the ones with a moustache. Bit of a pain m’thinks.

But as I am sure you have guessed, we can generate models that allow us to specify conditions to be used in generating the output images. So, let’s have a look at Conditional GANs.

The cGAN was first described by Mehdi Mirza and Simon Osindero in their 2014 paper titled “Conditional Generative Adversarial Nets.”

… by conditioning the model on additional information it is possible to direct the data generation process. Such conditioning could be based on class labels

Conditional Generative Adversarial Nets, 2014.

As I finish typing the above quote, I really do not know what I will be doing for this post. I have only looked at two tutorials on the web. And as of yet don’t fully understand the changes that need to be made to any of our existing GANs (DC or otherwise).

But the first approach will likely look at using labels for the images in both the discriminator and generator. As mentioned in the above quote.

Initial Code for Module

As my starter code, I copied the code from the cnn_clothing.py module/project I wrote for the DCGAN to generate the 10 types of clothing without altering the input image size.

I added some new variables, in keeping with the DCGAN code for the anime images. Refactored things to better control the code flow. Modified some of the model parameters (e.g. optimizer betas). And tested that the module still trained a GAN able to generate new/fake clothing images. And, that it did. Not only that but better images with less training epochs.

As an example here’s the code related to the generator model.

  # define Generator model class, it should "mirror" the discriminator
  class Generator(nn.Module):
    def __init__(self, noise_size, ngf=8):
      super().__init__()

      # sample_size => 784
      self.fc = nn.Sequential(
        nn.Linear(noise_size, i_sz),
        nn.BatchNorm1d(i_sz),
        nn.LeakyReLU())
      # 784 => 16 x 7 x 7
      self.reshape = Reshape(ngf * 2, 7, 7)
      # 16 x 7 x 7 => 32 x 14 x 14
      self.conv1 = nn.Sequential(
        nn.ConvTranspose2d(ngf * 2, ngf * 4,
                           kernel_size=5, stride=2, padding=2,
                           output_padding=1, bias=False),
        nn.BatchNorm2d(32),
        nn.LeakyReLU())
      # 32 x 14 x 14 => 1 x 28 x 28
      self.conv2 = nn.Sequential(
        nn.ConvTranspose2d(ngf * 4, 1,
                           kernel_size=5, stride=2, padding=2,
                           output_padding=1, bias=False),
        nn.Tanh())
      # Random value sample size
      self.noise_size = noise_size

    def forward(self, z):
      # Inputs are the sum of random inputs and label features
      # print(f"z.shape: {z.shape}")
      x = self.fc(z)        # => 784
      x = self.reshape(x)   # => 16 x 7 x 7
      x = self.conv1(x)     # => 32 x 14 x 14
      x = self.conv2(x)     # => 1 x 28 x 28
      return x

  genatr = Generator(nz, ngf)
  genatr.to(device)
  # Apply the weights_init function to randomly initialize all weights
  genatr.apply(weights_init)
  opt_g = mod_opt(genatr.parameters(), lr=g_lr, betas=betas)

Training Data

I plan on refactoring the code discussed above as necessary to produce a cDCGAN. I will once again use the Fashion-MNIST dataset. Since we used it for a classification model in one of the earliest projects we know it comes with labels.

As mentioned, the dataset I originally downloaded includes the labels. And are included in the dataset generated using torchvision.datasets.FashionMNIST. The dataloader will also provide those labels with each data batch. So, we will get images and the appropriate labels without any additional effort.

However, we will at some point need to have the labels made available to the models during training and generation. That is the next thing to figure out.

Labels

The first thing to understand is that the labels/classes can not be used as they stand to train the models. They are what are referred to in machine learning as categorical values. They are not really numeric variables in the true sense of the word. For our models to use them we will need to encode them as numerical values that the models can use. I am going to use one-hot encoding.

The generator will learn to extract features, i.e. types of clothing, based on our encoded labels, i.e. the conditions. So, I am going to code a custom network layer class that will convert the labels/conditions into feature vectors for use by both models.

We will also be using the custom network layer Reshape class, from that copied module, at some point in each model. Again allowing us to avoid resizing the images (as done in my first attempt at a DCGAN). And to resize the condition feature vector. Because within each network, generator and critic, we will need to combine the conditions with the image features.

That layer class is as follows.

  # new custom network layer  
  # used to convert conditions/classes into feature vectors
  # will be used in both critic and generator
  class Condition(nn.Module):
    def __init__(self, lr_slp=0.01, num_cl=10):
      super().__init__()

      # from one-hot encoding to features: 10 -> 784
      self.fc = nn.Sequential(
        nn.Linear(nbr_classes, i_sz),
        nn.BatchNorm1d(i_sz),
        nn.LeakyReLU(lr_slp)
      )

      self.nbr_cl = num_cl

    def forward(self, labels):
      # one-hot encode labels
      l = F.one_hot(labels, num_classes=self.nbr_cl)
      # convert to float (reqired by subsequent layers)
      l = l.float()
      # return feature vectors
      return self.fc(l)

Bug?

I am jumping ahead a little, but thought it best to tell the story here and then not worry about it any further.

During development, when I finally had enough code completed to test things, I got errors like the following.

RuntimeError:
python value of type 'device' cannot be used as a value. Perhaps it is a closed over global variable? If so, please consider passing it in as an argument or use a local varible instead.:

They were all related to saving the models using torch.jit.script(). I used a number of global variables in both models. So, I ended up adding them as parameters to the class __init__ method. And, if they were used in the forward method, saving them within the class and using the class variable in the forward method. Those changes are in the code samples below.

Prior to the above errors, I earlier in the testing got errors regarding tensors being on different devices. This was primarily due to creating data tensors in new locations (e.g. forward method of the generator). I had to make sure wherever I did something like that, that I sent the item to the appropriate device.

Critic

Let’s tackle the discriminator. Well the critic which is apparently what the discriminator model is called in a cGAN. In the cGAN module I refactored all use of the variables discriminator/discrm to critic. Except in a few comments.

The critic uses its condition layer to learn to discriminate features for each label while training. It uses those features to determine whether images are fake or real for a given condition/label.

The critic, as coded, will use two convolutional layers to learn image features. With a final fully connected layer that does the classification for the image and condition. It is to this layer that we will also pass the condition.

Won’t get into discussing those convolutional layers as they are pretty much what we have using for the last few projects. Other than to say the input to the first layer is a single channel (grayscale not colour images).

You will note that the condition layer resizes the label input to 16 x 7 x 7 before outputting the condition feature tensor. The image tensor following the second convolutional layer will be the same size (due to the selection of the number of kernels, stride and padding used in the convolutional layers). The two feature tensors will be added together element-wise and passed to that final fully connected layer. That addition requires them to be of the same shape.

One other major difference from previous versions. All we really use the discriminator for in a GAN is to generate loss values for the generator. If you look at the functions we previously wrote/used for training the discriminator with real and fake images all they did was return a loss value. So, let’s just do that in the critic itself. Will likely save a bit of code.

Okay here’s the code for the Critic class.

  # in cGAN discriminator is known as critic
  # define Critic model class
  # ouput will be result of loss function rather than predictions
  class Critic(nn.Module):
      def __init__(self, lr_slp=0.01, ndf=8, num_cl=10):
          super().__init__()
          
          # to convert labels into feature vectors
          self.cond = nn.Sequential(
            Condition(lr_slp, num_cl),
            Reshape(16, 7, 7)
          )

          # 1 x 28 x 28 => 32 x 14 x 14
          self.conv1 = nn.Sequential(
              nn.Conv2d(1, ndf * 4,
                        kernel_size=5, stride=2, padding=2, bias=False),
              nn.LeakyReLU(lr_slp, inplace=True))

          # 32 x 14 x 14 => 16 x 7 x 7
          self.conv2 = nn.Sequential(
              nn.Conv2d(ndf * 4, ndf * 2,
                        kernel_size=5, stride=2, padding=2, bias=False),
              nn.BatchNorm2d(ndf * 2),
              nn.LeakyReLU(lr_slp, inplace=True))

          # 16 x 7 x 7 => 784
          self.fc = nn.Sequential(
              nn.Flatten(),
              nn.Linear(i_sz, i_sz),
              nn.BatchNorm1d(i_sz),
              nn.LeakyReLU(lr_slp, inplace=True),
              nn.Linear(i_sz, 1),
              nn.Sigmoid()
          )

      def forward(self, img, labels, targets):
          # label features
          l = self.cond(labels)
          # image features + label features ?= real or fake
          x = self.conv1(img)       # => 32 x 14 x 14
          x = self.conv2(x)         # => 16 x 7 x 7
          preds = self.fc(x + l)    # => 1
          loss = F.binary_cross_entropy_with_logits(preds, targets)
          return loss

Generator

The generator is much the same as previous versions with the exception that we will be adding a condition layer. This time we do not reshape the condition output feature tensor. We use a fully connected layer to input the noise tensor to our model. The number of output neurons of both layers is the same. So we can add them together element-wise and reshape before passing to the first convolutional layer. This allows the convolutional layers to learn how to generate images that match the specified condition.

If we only used the label feature tensors, the generator would only ever learn to generate one image for each label. The noise tensors are essential to getting a variety of images for the same label.

By the way, adding the image feature and label feature is not the only option. A lot of the examples/tutorials I looked at used concatenation. Either during training or before input to the model during training. Using addition, which did not show up in many tutorials, seemed much simpler to me. I may eventually try the other approaches; but, that will require a bit of refactoring to get them to work with my current code.

Perhaps not much of an explanation, but I think it probably does as good a job of that as I can do with my limited knowledge of machine learning algorithms and mathematics. Here’s the code for the Generator class.

  # define Generator model class, it should "mirror" the critic
  class Generator(nn.Module):
    def __init__(self, nz=100, lr_slp=0.01, ngf=8, num_cl=10, device=torch.device('cuda:0')):
      super().__init__()

      # to convert labels into feature vectors
      self.cond = nn.Sequential(
        Condition(lr_slp, num_cl),
      )

      # sample_size => 784
      self.fc = nn.Sequential(
        nn.Linear(nz, i_sz),
        nn.BatchNorm1d(i_sz),
        nn.LeakyReLU(lr_slp)
      )

      # 784 => 16 x 7 x 7
      self.reshape = Reshape(ngf * 2, 7, 7)

      # 16 x 7 x 7 => 32 x 14 x 14
      self.conv1 = nn.Sequential(
        nn.ConvTranspose2d(ngf * 2, ngf * 4,
                           kernel_size=5, stride=2, padding=2,
                           output_padding=1, bias=False),
        nn.BatchNorm2d(ngf * 4),
        nn.LeakyReLU(lr_slp))

      # 32 x 14 x 14 => 1 x 28 x 28
      self.conv2 = nn.Sequential(
        nn.ConvTranspose2d(ngf * 4, 1,
                           kernel_size=5, stride=2, padding=2,
                           output_padding=1, bias=False),
        nn.Tanh())

      # Random value sample size
      self.nz = nz
      # device to use
      self.device = device

    def forward(self, labels):
      # Inputs are the sum of random inputs and label features
      l = self.cond(labels)
      b_sz = len(labels)
      z = torch.randn(b_sz, self.nz).to(self.device)
      x = self.fc(z)            # => 784
      x = self.reshape(x + l)   # => 16 x 7 x 7
      x = self.conv1(x)      # => 32 x 14 x 14
      x = self.conv2(x)       # => 1 x 28 x 28
      return x

Training Code

Let’s initialize our models, apply the optimizer, etc. I am again using torch.optim.Adam as the optimizer for both models.

  # real / fake targets for critic
  real_trgt = torch.ones(batch_sz, 1).to(device)
  fake_trgt = torch.zeros(batch_sz, 1).to(device)

  # instantiate the models, specify optimizer function
  critic = Critic(lr_slp=c_lr_slp, ndf=ngf, num_cl=nbr_classes)
  critic.to(device)
  opt_c = mod_opt(critic.parameters(), lr=c_lr, betas=betas)

  genatr = Generator(nz=nz, lr_slp=g_lr_slp, ngf=ngf, num_cl=nbr_classes, device=device)
  genatr.to(device)
  opt_g = mod_opt(genatr.parameters(), lr=g_lr, betas=betas)

And the training loop now looks like the following. Dropped three fairly lengthy functions from the code. And, the following loop is not really any longer than the one in the previous project’s code.

  st_tm = time.perf_counter()
  for epoch in range(trn_len):
    # print(f"epoch: {epoch + 1}")

    c_losses, g_losses = [], []

    for i, (images, labels) in enumerate(trn_loader):
      imgs = images.to(device)
      lbls = labels.to(device)

      loss_c = critic(imgs, lbls, real_trgt)
      
      fakes = genatr(lbls)
      loss_c += critic(fakes, lbls, fake_trgt)
      
      opt_c. zero_grad()
      loss_c.backward()
      opt_c.step()

      fakes = genatr(lbls)
      loss_g = critic(fakes, lbls, real_trgt)

      opt_g.zero_grad()
      loss_g.backward()
      opt_g.step()

      c_losses.append(loss_c.item())
      g_losses.append(loss_g.item())

    print(f"epoch {epoch + 1}, avg critic loss: {np.mean(c_losses)}, avg generator loss: {np.mean(g_losses)}")
    # save sample of images following each epoch of training
    labels = torch.LongTensor(list(range(10))).repeat(8).flatten().to(device)
    fakes = genatr(labels)
    imgs = (fakes * 0.5) + 0.5
    img_grid_lbl(imgs, labels, (8, 10), i_show=False, epoch=epoch)

  nd_tm = time.perf_counter()
  print(f"\ntime to train GAN : {nd_tm - st_tm}")

I won’t bother with the code to save the trained models (same as one or more of the preceding projects.)

I used 20 epochs of training with a batch size of 120. The latter was to eliminate some complaints during training. With a batch size of 128, the last batch was a different size than the rest. And during development I got an error regarding the batch size. Rather than find what I needed to fix in the code, I just changed the batch size to ensure each batch was the same size (yes, lazy!).

Training Output

Okay let’s look at some of the output generated during training. I won’t bother with a sample of the images in the dataset.

Here’s the terminal output.

(mclp-3.12) PS F:\learn\mcl_pytorch\chap5> python cgan_clothing.py
epoch 1, avg critic loss: 1.2156995449066161, avg generator loss: 0.6262994002699852
epoch 2, avg critic loss: 1.1192297258377075, avg generator loss: 0.662792281985283
epoch 3, avg critic loss: 1.091329069852829, avg generator loss: 0.6706088924407959
epoch 4, avg critic loss: 1.0666116964817047, avg generator loss: 0.6739794009923935
epoch 5, avg critic loss: 1.0436824040412902, avg generator loss: 0.6768716130256652
epoch 6, avg critic loss: 1.0309424908161164, avg generator loss: 0.6821612676382065
epoch 7, avg critic loss: 1.028166212797165, avg generator loss: 0.6834065381288529
epoch 8, avg critic loss: 1.0285682051181793, avg generator loss: 0.6841179760694504
epoch 9, avg critic loss: 1.0276792743206025, avg generator loss: 0.6845180579423904
epoch 10, avg critic loss: 1.0335486698150635, avg generator loss: 0.683507961511612
epoch 11, avg critic loss: 1.0278766355514526, avg generator loss: 0.6849954514503479
epoch 12, avg critic loss: 1.0298253643512725, avg generator loss: 0.6841921479701996
epoch 13, avg critic loss: 1.0226895236968994, avg generator loss: 0.6862935597896576
epoch 14, avg critic loss: 1.02450461435318, avg generator loss: 0.6862172081470489
epoch 15, avg critic loss: 1.0330372860431671, avg generator loss: 0.6834870508909225
epoch 16, avg critic loss: 1.0246339752674103, avg generator loss: 0.6866110379695892
epoch 17, avg critic loss: 1.0211518983840941, avg generator loss: 0.68707266330719
epoch 18, avg critic loss: 1.024600375175476, avg generator loss: 0.686942857503891
epoch 19, avg critic loss: 1.0269543619155883, avg generator loss: 0.6864082289934158
epoch 20, avg critic loss: 1.0272575795650483, avg generator loss: 0.6860461041927338

time to train GAN : 426.30407549999654

And some sample images following a few of the training epochs.

Generator Produced Images After Epochs 1 and 7 of Training
sample generator images following epoch 1 of training sample generator images following epoch 7 of training
Generator Produced Images After Epochs 14 and 20 of Training
sample generator images following epoch 14 of training sample generator images following epoch 20 of training

That last bunch of images is looking pretty good. Expect more epochs of training would help things significantly. But, that’s not really what I am here for.

Okay, let’s move from training to using the model to generate random images based on the requested clothing class(es).

Generating Images Using Saved Trained Model

Let’s test the model to see if it works as intended. I will generate a random selection of conditions and submit to the generator. Then print that sample along with the requested labels. I will do so twice, with a different selection of labels each time.

A look at the code.

  # Load model and generate images
  batch_sz = 120
  trn_len = 20
  fl_nm = Path(f"gen_cnn_clothing_{batch_sz}_{trn_len}.pt")
  genatr = torch.jit.load(sv_dir / fl_nm, map_location=device)
  # set to generation mode
  genatr.eval()
  for i in range(1, 3):
    # generate batch of fake images
    labels = torch.LongTensor(list(range(10))).repeat(5).flatten()
    # randomize the labels
    idx = torch.randperm(labels.nelement())
    labels = labels.view(-1)[idx].view(labels.size()).to(device)
    # generate images with requested classes
    fakes = genatr(labels)
    # generate and save image, only half the generated images
    imgs = (fakes * 0.5) + 0.5
    img_grid_lbl(imgs, labels, (5, 5), i_show=False, epoch=f't{i}')
Generator Produced Images After Training
sample generator images following training sample generator images following training

Done

And, with that, I think this post is well past its bed time. It certainly was an entertaining effort.

Until next time (perhaps a more complex or colourful cGAN), may your machine learning experiences give you as much pleasure as mine are currently doing. Though I am afraid that may change as things get more complex.

Resources

Postscript

Decided to train the model for 50 iterations.

(mclp-3.12) PS F:\learn\mcl_pytorch\chap5> python cgan_clothing.py
epoch 1, avg critic loss: 1.216192820072174, avg generator loss: 0.626003667473793
epoch 2, avg critic loss: 1.1198924248218536, avg generator loss: 0.6623466529846191
epoch 3, avg critic loss: 1.0930381178855897, avg generator loss: 0.6699770909547805
epoch 4, avg critic loss: 1.0690635411739349, avg generator loss: 0.6731013507843018
epoch 5, avg critic loss: 1.0452926578521728, avg generator loss: 0.6760220353603363
... ...
epoch 46, avg critic loss: 1.022902722120285, avg generator loss: 0.6872463500499726
epoch 47, avg critic loss: 1.0234066903591157, avg generator loss: 0.6889058525562286
epoch 48, avg critic loss: 1.0171670949459075, avg generator loss: 0.6890342910289764
epoch 49, avg critic loss: 1.0183188886642456, avg generator loss: 0.6889004520177842
epoch 50, avg critic loss: 1.020868512392044, avg generator loss: 0.6889100987911224

time to train GAN : 1057.1665946000721
Generator Produced Images After Training for 50 Epochs
sample generator images following training sample generator images following training

Looks to be a slight improvement in the generated images from the first model trained for only 20 epochs. Though not as much as I would have expected.