Potentially a waste of time, but I am going to code a CNN GAN to once again generate images of clothing. I.E. Project 3.

I am not going to provide anything much more than the code and the results. All the basic information about the dataset is in the post linked above.

I am going to copy the code from the module for the fully connected linear network GAN from Project 3. Then refactor as necessary. Mainly the models and the code to generate images. Perhaps more.

CNN Parameters

Lets start by giving some thought to how we will define the convolutional and other layers. I will mostly look at the discriminator for this purpose. This is really overkill as the fully connected layer model used previously would work just fine in this case—rather small images.

The images are grayscale 28x28 pixels. I am going to start with 32 kernels for the first convolutional layer and double for 3 more hidden layers. I can always increase the number of kernels and/or layers if I don’t like the initial results. I am also going to add a normalization layer following each of the convolutional layers except the first and last. And as usual an activation layer before all but the first convolutional layer. I am going to use a kernel size of 4. I will use a stride of 2 to reduce the image size at each layer rather than pooling. And, I will use a padding of 1 for each convolutional layer.

The final Conv2d will generate a single output (discriminator), using a step of 1 without padding.

And as I am making some big guesses here, I can as necessary experiment once I get things coded.

Failed Attempt

Okay, I refactored the discriminator and generator to use convolutional layers. But ran in to some issues, so tried to fix things best I could. But for love nor money could I get it to generate a 28 x 28 image. I finally read something that made sense. As I was shrinking or enlarging the layer outputs by a factor of 2, I really needed to start with images that had a size that was a power of 2. So, the smallest power of 2 greater than 28 is 32. So I reworked the transformer to generate images 32 x 32 from the 28 x 28 images in the dataset.

The code is based on that from the PyTorch DCGAN Tutorial. With some adjustments for the size of the images and that I am interested in grayscale only at this time.

# Some new variables
# Spatial size of training images. All images will be resized to this
# size using a transformer.
image_size = 32
# Number of channels in the training images. For color images this is 3
nc = 1
# Size of z latent vector (i.e. size of generator input)
nz = 100
# Size of feature maps in generator
ngf = 8
# Size of feature maps in discriminator
ndf = 8

Discrminator and Generator Models

Pretty straightforward using the PyTorch tutorial and example code.

  # define Discriminator model class
  class Discriminator(nn.Module):
    def __init__(self, do_p=0.3, ngpu=1):
      super().__init__()
      self.ngpu = ngpu
      self.model = nn.Sequential(
        # input is nc x 32 x 32
        nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
        nn.LeakyReLU(do_p, inplace=True),
        # state size: (ndf) x 16 x 16
        nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
        nn.BatchNorm2d(ndf * 2),
        nn.LeakyReLU(do_p, inplace=True),
        # state size: (ndf * 2) x 8 x 8
        nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
        nn.BatchNorm2d(ndf * 4),
        nn.LeakyReLU(do_p, inplace=True),
        # state size: (ndf * 4) x 4 x 4
        nn.Conv2d(ndf * 4, 1, 4, 1, 0, bias=False),
        # state size: (ndf * 4) x 1 x 1
        nn.Sigmoid(),
      )
      
    def forward(self, x):
      outp = self.model(x)
      return outp


  # define Generator model class, it should "mirror" the discriminator
  class Generator(nn.Module):
    def __init__(self, ngpu=1):
      super().__init__()
      self.ngpu = ngpu
      self.model = nn.Sequential(
        nn.ConvTranspose2d(nz, ngf * 4, 4, 1, 0, bias=False),
        nn.BatchNorm2d(ngf * 4),
        nn.ReLU(inplace=True),
        # current state size: (ngf * 4), 4, 4
        nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
        nn.BatchNorm2d(ngf * 2),
        nn.ReLU(inplace=True),
        # current state size: (ngf * 2), 8, 8
        nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False),
        nn.BatchNorm2d(ngf),
        nn.ReLU(inplace=True),
        # current state size: (ngf), 16, 16
        nn.ConvTranspose2d(ngf, nc, 4, 2, 1, bias=False),
       # current state size: 1, 32, 32
        nn.Tanh()
      )
      
    def forward(self, x):
      outp = self.model(x)
      return outp

Custom Weights

This is something I hadn’t seen before. And, I initially didn’t use this approach to generate custom weights. But I eventually decided to give it a go. And, I think it may have helped the GAN to produce decent images more quickly. But of course if it did, I don’t know why.

# custom weights initialization called on discrminator and generator
def weights_init(m):
  classname = m.__class__.__name__
  if classname.find('Conv') != -1:
    nn.init.normal_(m.weight.data, 0.0, 0.02)
  elif classname.find('BatchNorm') != -1:
    nn.init.normal_(m.weight.data, 1.0, 0.02)
    nn.init.constant_(m.bias.data, 0)

Initialize Models, Apply Custom Weights

Once again pretty straightforward. Especially when most of it is from the previous project with the slight adjustment for initial custom weights. I did try to use 2 GPUs in parallel, but that apparently doesn’t work on Windows 10. May need to look at installing Linux in a dual boot setup. Apparently WSL also doesn’t allow the use of multiple GPUs in parallel—as it is running under the Win 10 OS.

  # instantiate the models, specify optimizer function
  discrm = Discriminator(do_p=0.2, ngpu=ngpu)
  discrm.to(device)
  # Handle multi-gpu if desired
  # if (device.type == 'cuda') and (ngpu > 1):
  #   discrm = nn.DataParallel(discrm)
  # Apply the weights_init function to randomly initialize all weights
  # to mean=0, stdev=0.2. Those values hard-coded in function.
  discrm.apply(weights_init)
  opt_d = torch.optim.Adam(discrm.parameters(), lr=lr)

  genatr = Generator(ngpu=ngpu)
  genatr.to(device)
  # if (device.type == 'cuda') and (ngpu > 1):
  #   genatr = nn.DataParallel(genatr)
  # Apply the weights_init function to randomly initialize all weights
  # to mean=0, stdev=0.2.
  genatr.apply(weights_init)
  opt_g = torch.optim.Adam(genatr.parameters(), lr=lr)

The three model training functions, e.g. train_disc_real, needed a couple of changes. But, I will let you sort those out on your own. Shouldn’t be too hard given the runtime error messages.

Image Plotting/Saving Function

I had in the earlier project written a function to plot individual clothing images in matlplotlib subplots. But I had seen examples of something similar being done using the torchvision make_grid function/method. So I decided to give that a go.

def image_grid(images, ncol, i_show=True, epoch=0):
  image_grid = make_grid(images, ncol)     # Make images into a grid
  image_grid = image_grid.permute(1, 2, 0) # Move channel to the last
  image_grid = image_grid.cpu().numpy()    # Convert into Numpy

  plt.imshow(image_grid)
  plt.xticks([])
  plt.yticks([])
  if i_show:
    plt.show()
  else:
    plt.savefig(img_dir / f"cnn_1_{epoch}_{batch_sz}.png")
    plt.close()

And that code is less than half the size of my earlier function. I used the function to display 64 images from the training data before I started training. And, I used it to save the output from the current state of the generator for each epoch. I will also let you sort that bit of code as appropriate.

Train the GAN

Okay down to the nitty-gritty. I used 12 epochs.

Here’s the terminal output.

(mclp-3.12) PS F:\learn\mcl_pytorch\chap4> python cnn_clothing.py
epoch 1, d_loss: 0.21849249303340912, g_loss: 2.830129623413086
epoch 2, d_loss: 0.022564249113202095, g_loss: 4.858108043670654
epoch 3, d_loss: 0.012342630885541439, g_loss: 5.810791492462158
epoch 4, d_loss: 0.10467849671840668, g_loss: 4.657569408416748
epoch 5, d_loss: 0.10853267461061478, g_loss: 4.456841945648193
epoch 6, d_loss: 0.18557585775852203, g_loss: 3.403435707092285
epoch 7, d_loss: 0.27653026580810547, g_loss: 2.982327699661255
epoch 8, d_loss: 0.32711949944496155, g_loss: 2.763765573501587
epoch 9, d_loss: 0.4347558915615082, g_loss: 2.3301286697387695
epoch 10, d_loss: 0.5142689347267151, g_loss: 2.035801887512207
epoch 11, d_loss: 0.5754541158676147, g_loss: 1.8530807495117188
epoch 12, d_loss: 0.6471359729766846, g_loss: 1.7112855911254883

time to train GAN : 213.89834720012732

I think 3 ½ minutes training for what I got was pretty good. Especially since there was a bit of i/o in each epoch to save an image and send output to the terminal.

The following are images of a sample of the training data from the Fashion MNIST dataset, from an early training epoch, a later epoch and after the final epoch.

Sample of images from the Fashion MNIST dataset
sample of training images from the Fashion MNIST dataset
Sample of generator images after 3rd epoch
sample of generator images after 3rd epoch
Sample of generator images after 8th epoch
sample of generator images after 8th epoch

Starting to see some viable clothing.

Sample of generator images after final (12th) epoch
sample of generator images after final (12th) epoch

Don’t know about you, but I am beginning to see some obvious examples of individual clothing items.

And, I expect my enlarging the images to 32 x 32 may be affecting the quality of the output.

I may decide to run the training for a larger number of epochs and see what happens. But I also may not. I am feeling almost ready to tackle full colour anime images. Though a night’s sleep may change my mind.

Done!

I do believe that is it for this one. We will see where I go next.

And, may your travel decisions be much simpler.

Resources