Okay let’s see what some vector arithmetic can do with out latent space vectors.

Select Images

We are going to select images from the two glasses classes and the two gender classes. Three each.

I started by selecting 12 images each from the glasses and no glasses classes. I then plotted these to allow me to select the ones I wanted to use. Finally, I built lists from those final selections. I am not splitting the code between those two steps. And I am only going to show the final 12 selected images.

  if do_arith:
    g_ndx = [i*5 for i in range(1, 13)]
    ng_ndx = [-i*10 for i in range(12, 0, -1)]
    dd_ndx = g_ndx + ng_ndx
    print(dd_ndx)
    dd_img = torch.utils.data.Subset(imgs, dd_ndx)
    dd_ldr = torch.utils.data.DataLoader(dd_img, batch_size=24, shuffle=False)
    dd_btch, _ = next(iter(dd_ldr))
    dd_btch.numpy()
    # utl.image_grid(dd_btch, 6, i_show=True, epoch=cfg.epochs, b_sz=24, img_cl='subset of images')
    f_g = [dd_btch[1], dd_btch[5], dd_btch[6]]
    m_g = [dd_btch[3], dd_btch[7], dd_btch[8]]
    f_ng = [dd_btch[12], dd_btch[19], dd_btch[22]]
    m_ng = [dd_btch[14], dd_btch[15], dd_btch[17]]
    dd_sub = [*f_g, *m_g, *f_ng, *m_ng]
    utl.image_grid(dd_sub, 3, i_show=True, epoch=cfg.epochs, b_sz=12, img_cl='subset of images')

And the ones I picked are shown below. Though not sure how things are going to work out with the one with the head gear.

sample of images interpolating between two random feature vectors

Some Arithmetic

Averages

We are going to take the average of the three encodings in each group. This effectively smoothes out the representation in the space and gives a representation that hopefully captures common features within each group. I.E. glasses, female, etc.

A new function to help us along.

    def get_avg_ncdg(imgs):
      btch = torch.cat((imgs[0].unsqueeze(0), imgs[1].unsqueeze(0), 
             imgs[2].unsqueeze(0)), dim=0).to(cfg.device)
      with torch.no_grad():
        _, _, z = vae.ncdr(btch)
        z_avg = z.mean(dim=0)
        recon = vae.dcdr(z_avg.unsqueeze(0))
      return z_avg, recon

And after the image selection code above, the following generates and displays the images reconstructed from our average feature vectors.

    grps = {"f_g": f_g, "m_g": m_g, "f_ng": f_ng, "m_ng":m_ng}
    avg_imgs, avg_z = {}, {}
    for grp in grps:
      avz, avi = get_avg_ncdg(grps[grp])
      avg_imgs[grp] = avi
      avg_z[grp] = avz
      # print(f"avi.shape: {avi.shape}")
    ai = list(avg_imgs.values())
    p_img = torch.cat((ai[0], ai[1], ai[2], ai[3]), dim=0).to(cfg.device)
    # print(f"p_img.shape: {p_img.shape}")

    utl.image_grid(p_img, 4, i_show=True, epoch=cfg.epochs, b_sz=4, img_cl='average reconstructed images')

Those images look like the following. Well more or less. As I have not set a random seed, the images will likely change slightly with each call to the above code. Something I can currently live with.

reconstructed images for our average feature vectors

Average Feature Vector Arithmetic

Let’s start by adding and subtracting some sequence of the average feature vectors.

Attempt #1

I am going to start with the average male with glasses vector and subtract the female with glasses vector.

    z = avg_z["m_g"] - avg_z["f_g"]
    with torch.no_grad():
      rcn =vae.dcdr(z.unsqueeze(0))
    p_img = torch.cat((avg_imgs["m_g"], avg_imgs["f_g"], rcn), dim=0)
    utl.image_grid(p_img, 4, i_show=True, epoch=cfg.epochs, b_sz=4, img_cl='')

And, the images involved and the image reconstructed from the vector following the subtraction are as follows.

reconstructed image from the subtraction of two average feature vectors

By subtracting the femaie with glasses vector from the male with glasses vector we end up producing a male without glasses vector. So the features for glasses have disappeared. Pretty cool.

Attempt #2

Let’s try subtracting a male with no glasses vector from a male with glasses vector. Want to guess what we get?

    z = avg_z["m_g"] - avg_z["m_ng"]
    with torch.no_grad():
      rcn =vae.dcdr(z.unsqueeze(0))
    # print(z.shape, rcn.shape)
    p_img = torch.cat((avg_imgs["m_g"], avg_imgs["m_ng"], rcn), dim=0)
    utl.image_grid(p_img, 4, i_show=True, epoch=cfg.epochs, b_sz=4, img_cl='')

I know, should have written a function, but copying code was easier while I was sorting things out.

And the resulting image.

So, subtracting a male with no glasses from a male with glasses left us with what looks like a female with glasses. The maleness was more or less removed.

Attempt #3

Let’s add a female with no glasses to a male with glasses.

    z = avg_z["m_g"] + avg_z["f_ng"]
    with torch.no_grad():
      rcn =vae.dcdr(z.unsqueeze(0))
    # print(z.shape, rcn.shape)
    p_img = torch.cat((avg_imgs["m_g"], avg_imgs["f_ng"], rcn), dim=0)
    utl.image_grid(p_img, 4, i_show=True, epoch=cfg.epochs, b_sz=4, img_cl='')

And…

reconstructed image from the addition of two average feature vectors

A female like face shaped very much like the original male face. Though I don’t understand why the glasses disappeared. Again, an interesting result.

Attempt #4

Let’s add a bit more arithmetic. A subtraction followed by an addition. A female with glasses subtracted from a male with glasses added to a female with no glasses.

    z = avg_z["m_g"] - avg_z["f_g"] + avg_z["f_ng"]
    with torch.no_grad():
      rcn =vae.dcdr(z.unsqueeze(0))
    p_img = torch.cat((avg_imgs["m_g"], avg_imgs["f_g"], avg_imgs["f_ng"], rcn), dim=0)
    utl.image_grid(p_img, 4, i_show=True, epoch=cfg.epochs, b_sz=4, img_cl='m_g - f_g + m_ng = m_ng')

Is this what you expected? Do you also see a similarity with the final image above?

reconstructed image from the subtraction of two average feature vectors followed by the addition of another

Attempt #5

Okay, one more with the average feature vectors before I try starting with an image from the training dataset.

    z = avg_z["f_ng"] - avg_z["m_ng"] + avg_z["f_g"]
    with torch.no_grad():
      rcn =vae.dcdr(z.unsqueeze(0))
    # print(z.shape, rcn.shape)
    p_img = torch.cat((avg_imgs["f_ng"], avg_imgs["m_ng"], avg_imgs["f_g"], rcn), dim=0)
    utl.image_grid(p_img, 4, i_show=True, epoch=cfg.epochs, b_sz=4, img_cl='f_ng - m_ng + f_g = f_g')

By subtracting the average male from an average female and then adding an average female, I think the final image is much more strongly female in appearance than the reconstructed images from the preceding two average female feature vectors.

Attempt #6

This code went through a few iterations. But I started with a randomly selected image from the training dataset then applied a couple of those average feature vectors with a subtraction followed by an addition.

After some playing around, I ended up applying two different bits of arithmetic to two randomly selected images. And plotting both sets of images in one grid.

    hf_ndx = len(imgs) // 2
    mx_ndx = len(imgs)
    gx = cfg.rng.integers(low=0, high=hf_ndx, size=1)
    ngx = cfg.rng.integers(low=hf_ndx, high=mx_ndx, size=1)
    gi, _ = imgs[gx[0]]
    ngi, _ = imgs[ngx[0]]
    print(gi.shape)
    with torch.no_grad():
      gi = gi.unsqueeze(0).to(cfg.device)
      _, _, gz = vae.ncdr(gi)
      z1 = gz - avg_z["m_g"] + avg_z["f_ng"]
      # print(gi.shape, z.shape)
      rcn1 =vae.dcdr(z1)
      ngi = ngi.unsqueeze(0).to(cfg.device)
      _, _, ngz = vae.ncdr(gi)
      z2 = ngz - avg_z["f_ng"] + avg_z["m_g"]
      # print(gi.shape, z.shape)
      rcn2 =vae.dcdr(z2)
    # print(z.shape, rcn.shape)
    p_img = torch.cat((gi, avg_imgs["m_g"], avg_imgs["f_ng"], rcn1,
      ngi, avg_imgs["f_ng"], avg_imgs["m_g"], rcn2), dim=0)
    utl.image_grid(p_img, 4, i_show=True, epoch=cfg.epochs, b_sz=8, img_cl='ls_g - m_g + f_ng = f_ng')

And here’s a couple of examples.

reconstructed images from the subtraction of an average feature vector from a dataset image feature vector followed by the addition of another

That top row surprised me a bit. I was expecting a much more obviously female image after the regeneration of the vector following the arithmetic.

I found it interesting that subtracting glasses from sunglasses left the final person with very blackened eyes. Though, I didn’t enlarge the image to confirm, those could just be the sunglass lens left after the frame is removed. And on the bottom row we ended up with a person with sunglasses. Not obvious why to me.

Done, maybe

I am thinking of trying this arithmetic using non-averaged feature vectors. But have no idea if that has any value in it.

If I do, I will add more content. If I don’t this is it for this one. Using vector arithmetic really did generate some never before seen images. And allowed us some limited control of the generated image features.

Note: 2025/02/21, no additions to the post. I went in other directions.

Too Old To Code

MCL with Pytorch: Variational Autoencoder, Part IV