Okay, let’s get started coding the GAN for this project—using Wasserstein distance with gradient penalty for our loss function.

I will be resizing the images from 1024 x 1024 to 256 x 256. Hopefully the GPU can handle batches of 16 images of that size. I will also be combining the hot-encoded conditions in one tensor with the images or noise vectors. I had previously passed those to the forward methods separately or generated the noise tensors within the method. That presented issues when, in the last post, I started playing around morphing the generated images. A bit of fooling around to make that work, but it has benefits.

I won’t bother with the imports and intial bits of code, I expect if you’ve been following along that you will have no problem setting things up to suit yourself.

Let’s start with the CNNs for the the critic and generator.

Various Global Variables

Well, a wee interruption first. You could likely have sorted most of these while reading any code presented in the post. But perhaps easier if I just include them in the post.

# Class labels, in dataset they are digits 0 or 1
# csv: 0 == no glasses
# ImageFolder: o == glasses
#    reversed ImageFolder assigns label values based on alphabetical order of subdirectory names
categories = ('glasses', 'no glasses')

... ...

data_dir = Path("./data/glasses")

... ...

# need outside training loop so can use for evaluation
img_sz = 256             # spatial size of transformed images
i_sz = img_sz * img_sz
nc = 3                   # number of image channels
trn_len = 100            # number of epochs for training
init_features = 16       # critic initial layer output features
batch_sz = 16            # batch size during training
nbr_classes = 2          # g or no-g
g_lr_slp = 0.2           # generator LeakyReLU negative_slope
g_lr = 0.0001            # generator learninng rate (higher than before)
nz = 100                 # dimension for noise tensor
ngf = 16                 # feature map size in generator

... ...

if trn_model:
  torch.manual_seed(73)

  nc = 3                  # number of channels in training images
  max_ep = 100            # maximum number of epochs for training
  c_lr_slp = 0.2          # critic LeakyReLU negative_slope
  c_lr = 0.0001           # critic learninng rate
  mod_opt = torch.optim.Adam  # default optimizer
  betas=(0.0, 0.9)        # optimizer betas
 ... ...

Models/Networks

These will be similar to what we saw in the previous project. But, because the images are considerably larger, I will be using more convolutional layers to gradually downsample or upsample the inputs to get the desired outputs. In the case of the critic, a single value between \( -\infty \) and \( \infty \). An image for the generator output.

And as has been the case all along, the generator will mirror the critic.

I will, again, be initializing the weights in each network. For the convolutional layers, they will be initialized with values from a normal distribution with a mean of 0 and a standard deviation of 0.02. For the batch norm layers a distribution with a mean of 1 is used. A small standard deviation is selected to avoid vanishing gradients.

The function, weights_init is identical to the one in the previous project.

Critic

The input to the critic network is a tensor with a shape of 5 x 256 x 256. The first three channels are the colour channels (red, green, blue). The last two are the condition labelling channels. They are the one-hot encoded vector for the specified condition (glasses or no glasses).

In the previous project our critic had 2 Conv2d layers followed by a fully connected layer to produce the image’s score. Because of the increased complexity and size of our images, I will be using 7 Conv2d layers without any fully connected layers. I will also be using InstanceNorm2d instead of BatchNorm2d on the inner layers. Each instance in the batch will be normalized separately. Each layer but the last will be followed by LeakyReLU activation as well. The five inner layers will be pretty much identical, so will add a method to our class to generate them.

Rather than assign each layer to an individually named variable. I decided to use a list to store the individual layers. In the forward method for the class, I will traverse the list executing each layer. Returning the output from the final layer.

Bugs

When I tested my code for the class, I got the following error.

(mclp-3.12) PS F:\learn\mcl_pytorch\chap5> python wgan-wp_g_ng.py
[]
Traceback (most recent call last):
  File "F:\learn\mcl_pytorch\chap5\wgan-wp_g_ng.py", line 171, in <module>
    opt_c = mod_opt(critic.parameters(), lr=c_lr, betas=betas)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\appDev\Miniconda3\envs\mclp-3.12\Lib\site-packages\torch\optim\adam.py", line 45, in __init__
    super().__init__(params, defaults)
  File "E:\appDev\Miniconda3\envs\mclp-3.12\Lib\site-packages\torch\optim\optimizer.py", line 273, in __init__
    raise ValueError("optimizer got an empty parameter list")
ValueError: optimizer got an empty parameter list

I was using a plain Python list, self.c_layers = []. Apparently PyTorch did not recognize that the list contained valid network layers. So, I needed to use self.c_layers = nn.ModuleList(). The rest of the code remained unchanged.

ModuleList can be indexed like a regular Python list, but modules it contains are properly registered, and will be visible by all Module methods.

nn.ModuleList

I eventually also got this error.

(mclp-3.12) PS F:\learn\mcl_pytorch\chap5> python wgan-wp_g_ng.py
Traceback (most recent call last):
  File "F:\learn\mcl_pytorch\chap5\wgan-wp_g_ng.py", line 192, in <module>
    score = critic(img_and_label).detach()
            ^^^^^^^^^^^^^^^^^^^^^
  File "E:\appDev\Miniconda3\envs\mclp-3.12\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl

... skip a bunch of stuff ...

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

It turns out, I wasn’t assigning both my image and labels to the GPU, before doing the concatenation.

And, finally.

(mclp-3.12) PS F:\learn\mcl_pytorch\chap5> python wgan-wp_g_ng.py
  File "F:\learn\mcl_pytorch\chap5\wgan-wp_g_ng.py", line 192, in <module>
    score = critic(img_and_label).detach()
            ^^^^^^^^^^^^^^^^^^^^^
  File "E:\appDev\Miniconda3\envs\mclp-3.12\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)

... skip a bunch of stuff ...

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Given groups=1, weight of size [1, 32, 4, 4], expected input[1, 512, 4, 4] to have 32 channels, but got 512 channels instead

I had defined my final layer as follows.

nn.Conv2d(self.c_mults[-1], 1,
                  kernel_size=self.k_sz, stride=self.strd,
                  padding=0)

It should have been the following.

nn.Conv2d(self.c_mults[-1] * features, 1,
                  kernel_size=self.k_sz, stride=self.strd,
                  padding=0)

Final Critic Class Code

Including the code to instantiate the network, assign custom weights and instantiate the optimizer.

  class Critic(nn.Module):
    def __init__(self, channels, features, relu=0.2):
      super().__init__()
      self.k_sz, self.strd, self.padg, self.relu = 4, 2, 1, relu
      self.c_mults = [1, 2, 4, 8, 16, 32]
      # dosn't work: ValueError: optimizer got an empty parameter list
      # self.c_layers = []
      self.c_layers = nn.ModuleList()
      # input layer
      self.c_layers.append(nn.Sequential(
        nn.Conv2d(channels, features,
                  kernel_size=self.k_sz, stride=self.strd,
                  padding=self.padg),
        nn.LeakyReLU(self.relu)
      ))
      # 5 internal layers
      for im in range(5):
        self.c_layers.append(self.cn_layer(features * self.c_mults[im], features * self.c_mults[im + 1]))
      # output layer
      self.c_layers.append(nn.Conv2d(self.c_mults[-1] * features, 1,
                  kernel_size=self.k_sz, stride=self.strd,
                  padding=0)
      )

    def cn_layer(self, c_in, c_out):
      return nn.Sequential(
        nn.Conv2d(c_in, c_out,
                  kernel_size=self.k_sz, stride=self.strd,
                  padding=self.padg),
        nn.InstanceNorm2d(c_out, affine=True),
        nn.LeakyReLU(self.relu)
      )
    
    def forward(self, x):
      for cl in self.c_layers:
        x = cl(x)
      return x

  critic = Critic(nc + nbr_classes, init_features, relu=c_lr_slp)
  critic.to(device)
  # Apply the weights_init function to randomly initialize all weights
  critic.apply(weights_init)
  opt_c = mod_opt(critic.parameters(), lr=c_lr, betas=betas)

I won’t, at this time bother with my test code. But here’s the output. Note, I am setting torch.manual_seed(73) when training the GAN.

(mclp-3.12) PS F:\learn\mcl_pytorch\chap5> python wgan-wp_g_ng.py
tensor([[[-0.5478]]], device='cuda:1')

Generator

Okay let’s get the generator class coded and instantiated. And maybe tested. Given I have coded the Critic class this one should be a touch easier. And, of course we will being using ConvTranspose2d instead of Conv2d layers. I defined this class outside the training loop, as I will later be loading and using it in evaluation mode. So will need access to it in both of the if/else blocks containing the training or evaluation code.

class Generator(nn.Module):
  def __init__(self, noise_chnl, img_chnl, features, relu=0.01):
    super().__init__()
    self.k_sz, self.strd, self.padg, self.relu = 4, 2, 1, relu
    self.c_mults = [64, 32, 16,  8, 4, 2]
    self.c_layers = nn.ModuleList()
    # input layer
    self.c_layers.append(
      self.cn_layer(noise_chnl, features * self.c_mults[0],
                    self.k_sz, 1, 0)
    )
    # 5 internal layers
    for i in range(5):
      self.c_layers.append(
        self.cn_layer(features * self.c_mults[i], features * self.c_mults[i + 1],
                      self.k_sz, self.strd, self.padg)
      )
    # output layer (image)
    self.c_layers.append(
      nn.Sequential(
        nn.ConvTranspose2d(features * self.c_mults[-1], img_chnl,
                           self.k_sz, self.strd, self.padg),
        nn.Tanh()
      )
    )

  def cn_layer(self, c_in, c_out, ksz, strd, padg):
    return nn.Sequential(
      nn.ConvTranspose2d(c_in, c_out,
                kernel_size=ksz, stride=strd,
                padding=padg),
      nn.BatchNorm2d(c_out, affine=True),
      nn.LeakyReLU(self.relu)
    )
  
  def forward(self, x):
    for cl in self.c_layers:
      x = cl(x)
    return x


  genr = Generator(nz + nbr_classes, nc, init_features)
  genr.to(device)
  genr.apply(weights_init)
  gen_c = mod_opt(genr.parameters(), lr=c_lr, betas=betas)

And a simple test with an input of 16 noise vectors, produced the following. Not particularly useful, put at least the outputs are able to be plotted as images.

sample of images produced when testing the Generator class code

Test Code

This is a pretty long post, so I don’t think I will start looking at the code for the gradient penalty, training loop, etc. Instead I will show you my test code and call it a day. There are some bits and pieces for you to think about in there.

  if tst_critic:
    img = Image.open(data_dir/'NoG'/'face-1.png')
    img = transform(img).to(device)
    # note change in class label
    label = 1
    onehot=torch.zeros((2))
    onehot[label] = 1
    channels = torch.zeros((2, img_sz, img_sz)).to(device)
    channels[label, :, :] = 1
    img_and_label = torch.cat([img, channels], dim=0)
    # img_and_label.to(device)
    score = critic(img_and_label).detach()
    print(score)
  elif tst_genr:
    noise = torch.randn(16, nz, 1, 1)
    labels = torch.zeros(16, 2, 1, 1)
    for i in range(1, 16, 2):
      labels[i, 0, :, :] = 1
    nz_lbls = torch.cat([noise, labels], dim=1).to(device)
    fake = genr(nz_lbls)
    img = fake / 2 + 0.5
    img_grid_lbl(img, labels[:, 0].flatten(), 4, i_show=True, epoch=0)

Done

Well, this is a longer post than I was expecting at this point. And, we haven’t even really gotten to the new stuff yet.

Next time I plan on writing the function to calculate the gradient penalty. The code to generate the batches and modify them for passing to the network for training. And hopefully the actual training loop and any other necessary functions. Not sure about the latter at this time.

Until then do keep your Wasserstein distances to a minimum.

Resources