After that last post, I thought I was going to move on to something along the lines of full colour heads/faces with or without glasses. And, to see if I can sort out the next model type: WGAN-GP (Wasserstein GAN with Gradient Penalty). But, I saw something that I decided I’d like to try. Using the cGAN to generate a series of images moving from one image type to another. And there were two approaches: using a weighted sum of labels or a weighted sum of noise vectors.

So let’s see what I can produce.

When I started looking at this I realized the model I currently have can’t handle this kind of image morphing. At the very least I will need to significantly refactor the generator’s forward method. Worst case, I may need to refactor and train a new model.

The refactored generator would accept the one-hot encoded labels combined with the noise vectors as its input. That way I can weight them as necessary before passing them to the generator to produce the series of morphing images.

Bummer! I will start by trying to refactor the forward function, while still using the saved network’s model parameters. And see if that works.

Weighted Sum of Labels

Okay, I started by refactoring the generator model’s forward method. But when I ran the refactored code I got the following runtime error.

(mclp-3.12) PS F:\learn\mcl_pytorch\chap5> python cgan_clothing.py
tensor([2, 5, 5, 5, 5, 5, 5, 5, 5], device='cuda:1')
Traceback (most recent call last):
  File "F:\learn\mcl_pytorch\chap5\cgan_clothing.py", line 563, in <module>
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\appDev\Miniconda3\envs\mclp-3.12\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: forward() expected at most 2 argument(s) but received 4 argument(s). Declaration: forward(__torch__.Generator self, Tensor labels) -> Tensor

At that point I decided to refactor the custom Condition layer instead of the Generator class. At least for this approach. Not sure that will work for the weighted noise approach. But…

RuntimeError: running_mean should contain 9 elements not 784

Eventually, though not really quickly enough, I realized that because I had saved the model as torchscript I could not refactor the model code. The generator was being scripted on load from file exactly as saved. So the modified method parameters I was trying to send, based on the refactored code, simply did not match the loaded model’s method parameters.

So, I loaded the torchscript version and saved the model state. In my morphing code block, I instantiated the model, applied the saved state, put it in evaluation mode and generated a test image. I won’t bother showing my initial test code.

  st_fl = "gen_cgan_cloth_state.pt"
... ...
  fl_nm = Path(f"gen_cnn_clothing_{batch_sz}_{trn_len}.pt")
  genatr = torch.jit.load(sv_dir / fl_nm, map_location=device)
  torch.save(genatr.state_dict(), sv_dir / st_fl)
... ...
  elif tst_mod_2:
    genatr = Generator(nz=nz, lr_slp=g_lr_slp, ngf=ngf, num_cl=nbr_classes, device=device)
    genatr.load_state_dict(torch.load(sv_dir / st_fl))
    genatr.to(device)
    genatr.eval()
... ...

I also had to move the Generator related classes and variables out of the training block so that the morphing block could have access to them.

Because of the issues I was having, I decided not to change any of the model’s method parameter lists. I instead modified the structure of the labels being passed to the methods. Fairly messy approach, but for now I am going to stick with it.

I made the labels vector a 3-dimensional tensor. The first dimension is the actual labels to use for morphing. Though not what you likely think. The second is the type of morphing to use (labels or noise). The third is the number of steps of morphing between the first and second items of clothing. But because tensor dimensions must be of equal sizes and of the same data type, I needed to repeat many of the items an appropriate number of times. Something like the following.

    n_steps = 18         # number of intermediate images
    i_st = 1             # initial clothing item
    i_nd = 3             # final clothing item
    labels = [[i_st], [1], [n_steps + 2]]
    for i in range(1, n_steps + 2):
      labels[0].append(i_nd)
      labels[1].append(1)
      labels[2].append((n_steps + 2))
    labels = torch.LongTensor(labels).to(device)
    print(len(labels), labels.shape, labels)
(mclp-3.12) PS F:\learn\mcl_pytorch\chap5> python cgan_clothing.py
3 torch.Size([3, 20])
tensor([[ 1,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3],
        [ 1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1],
        [20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20]], device='cuda:1')

Condition

Now let’s have a look at the forward method for the custom Condition network layer. What we’re doing is creating a set of one-hot encoded vectors of which the interior ones provide a weighted average for the two classes we are after.

class Condition(nn.Module):
... ...

  def forward(self, labels):
    print(f"size: {len(labels)}; type: {labels[1][0]}; nbr: {labels[2][0]}")
    if len(labels) != 3:
      # one-hot encode labels
      l = F.one_hot(labels, num_classes=self.nbr_cl)
      # convert to float (reqired by subsequent layers)
      l = l.float()
    elif labels[1][0] == 1:
      # use weighted labels
      m_sz = labels[2][0]
      l1 = F.one_hot(labels[0][0], num_classes=self.nbr_cl).to(device)
      l2 = F.one_hot(labels[0][1], num_classes=self.nbr_cl).to(device)
      pcnt_2nd_lbl = torch.linspace(0, 1, m_sz)[:, None].to(device)
      l = l1 * (1 - pcnt_2nd_lbl) + l2 * pcnt_2nd_lbl
      l = l.float()
      print(l)
    # return feature vectors
    return self.fc(l)

And, here’s what the modified method’s output looks like for my chosen set of clothing items (1 and 3) and the number of morphing steps (20). Well, not all twenty rows are displayed. But enough to hopefully illuminate what is being done in the code above. The first and last rows are the unmodified one-hot encoding for the two items of clothing.

tensor([[0.0000, 1.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000],
        [0.0000, 0.9474, 0.0000, 0.0526, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000],
        [0.0000, 0.8947, 0.0000, 0.1053, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000],
... ...
        [0.0000, 0.1053, 0.0000, 0.8947, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000],
        [0.0000, 0.0526, 0.0000, 0.9474, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000],
        [0.0000, 0.0000, 0.0000, 1.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000]], device='cuda:1')

Let’s Morph

A little bit more code. I did refactor img_grid_lbl(). But, won’t bother showing the modified code. I am sure you will be able to sort it out.

... ...
    fakes = genatr(labels)
    imgs = (fakes * 0.5) + 0.5
    img_grid_lbl(imgs, labels[0], (4, (n_steps + 2) // 4), i_show=False, epoch=f'm1')

And here’s a sample of the output. Which, because I do not invoke torch.manual_seed when using the model in evaluation mode, should change each time the code is executed.

series of images showing morphing of a trouse into a dress

Nothing spectular. But, a bit of new knowledge and a touch of fun slightly clouded by a sprinkling of frustration.

Weighted Sum of Noise

In the next project we will actually use the noise vector to determine a characteristic/condition of the generated images. That is not possible for the current set up. We only have one condition (type of clothing). If the images were also identified as female or male clothing then we could likely have used the noise tensor to determine the gender of the generated clothing image(s).

So not too sure there is anything to discuss about employing a weighted sum of noise tensors at this time.

That said, I decided to try using weighted sums of a start and end condition and/or of two noise vectors. Rather than a different noise vector for each of the weighted sums of the two conditions. I am not going to bother showing the code—I expect you can sort that out based on the above code samples.

Here’s a sample image using the first approach: start and end condition with a random noise vector for each image. This time morphing from a trouser to an ankle boot.

series of images showing morphing of a trouse into an ankle boot using random noise vectors

And here’s the same transition but using a weighted sum of two different noise vectors with the weighted sum of the two terminal conditions.

series of images showing morphing of a trouse into an ankle boot using weighted noise vectors

Because the weighted noise vector slowing moves away from the first noise vector and on to the second, the images are not as drastically altered at each step of the transition from one garment to the other. Images are little more muddled in the middle of the set of weighted noise and label vectors. The middle stretch being where the weighted vectors are the most dissimilar to either of the initial pair of the relevant vectors. And less so at the either end. Though in reverse for the final garment.

Definitely a much cleaner looking set of morphed images using the second approach.

Ok, ok, I hear you. No need to shout! So, here’s an example with the same noise vector applied across the set of weighted conditions.

series of images showing morphing of a trouse into an ankle boot using weighted noise vectors

A slight variation due to the changed noise vector, but all in all very similar to the image above.

Done

Believe that is it for this one. The ability to morph the images using a cGAN is a nice bonus.

Next time we will start on a more ambitious cGAN. We will also explore a currently more favoured GAN model: the WGAN-GP.

Until next time, may your code generate some interesting new images.

Resources