Decided to go for a 10 epoch training run.

Ten Epochs of Training

Here’s the terminal window output. Took about ½ hour. GPU never really got pushed; never heard the fans crank up the pace like they did while trying to train the CycleGAN and some of the other models. Wonder why?

(mclp-3.12) PS F:\learn\mcl_pytorch\proj7> python vae.py -rn rek_4 -bs 16 -ep 10
 {'run_nm': 'rek_4', 'dataset_nm': 'no_nm', 'sv_img_cyc': 150, 'sv_chk_cyc': 50, 'resume': False, 'start_ep': 0, 'epochs': 10, 'batch_sz': 16, 'num_res_blks': 9, 'x_disc': 1, 'x_genr': 1, 'x_eps': 0, 'use_lrs': False, 'lrs_unit': 'batch', 'lrs_eps': 5, 'lrs_init': 0.01, 'lrs_steps': 25, 'lrs_wmup': 0}
image and checkpoint directories created: runs\rek_4_img & runs\rek_4_sv
batch size: 16, nbr batches: 278
logger: {'vae': []}
epoch 1: 100%|███████████████████████████████████████████████████████████████████████| 278/278 [02:51<00:00,  1.62it/s]
epoch 2: 100%|███████████████████████████████████████████████████████████████████████| 278/278 [02:52<00:00,  1.61it/s]
epoch 3: 100%|███████████████████████████████████████████████████████████████████████| 278/278 [02:52<00:00,  1.61it/s]
epoch 4: 100%|███████████████████████████████████████████████████████████████████████| 278/278 [02:53<00:00,  1.60it/s]
epoch 5: 100%|███████████████████████████████████████████████████████████████████████| 278/278 [02:53<00:00,  1.60it/s]
epoch 6: 100%|███████████████████████████████████████████████████████████████████████| 278/278 [02:53<00:00,  1.60it/s]
epoch 7: 100%|███████████████████████████████████████████████████████████████████████| 278/278 [02:52<00:00,  1.61it/s]
epoch 8: 100%|███████████████████████████████████████████████████████████████████████| 278/278 [02:52<00:00,  1.61it/s]
epoch 9: 100%|███████████████████████████████████████████████████████████████████████| 278/278 [02:53<00:00,  1.60it/s]
epoch 10: 100%|██████████████████████████████████████████████████████████████████████| 278/278 [02:52<00:00,  1.61it/s]

Here’s the plot of the training losses. Fair improvement over what we saw with a single epoch of training.

training losses over 10 epochs of training

And here’s the images (training and regenerated) saved after five and ten epochs of training.

sample images after 5 epochs of training — Images After 5 and 10 Epochs of Training

sample images after 10 epochs of training — Images After 5 and 10 Epochs of Training

Perhaps not great, but I think that’s all the training I will do for now. It gets the idea across.

Playing With the Latent Space

Okay time to see what we can do with the latent space vectors.

I won’t show the code or the images generated but I did test loading the model from the checkpoint and running a small sample of images. Then plotting the real and regenerated images to be sure things were working as expected.

Load Model from File

Okay, I will load the trained model and put it in evaluation mode.

# the following is just after the code loading the training data
# was already in the module
# instantiate model and optimizer
l_dims = 100
vae=VAE(l_dims).to(cfg.device)
... ...
# if block for the new experimental code
if tst_model:
  utl.ld_chkpt(cfg.sv_dir/f"vae_{cfg.epochs}.pt", vae, opt)
  vae.eval()

Random Feature Vectors

First thing I am going to try is “prior sampling”. That is, use a standard normal distribution to generate the latent space vectors. Because this will generate vectors likely never seen by the model during training the generated images will be fairly poor. Especially given how little training I gave the model.

  if do_rand:
    # let's try generating some random latent space vectors and see what we get
    with torch.no_grad():
      noise = torch.randn(16,l_dims).to(cfg.device)
      p_img = vae.dcdr(noise).cpu()
    utl.image_grid(p_img, 4, i_show=True, epoch=cfg.epochs, b_sz=cfg.batch_sz, img_cl='random latent space vectors')

And a sample image.

sample of images using random feature vectors

Traversal of Latent Space

The idea is to take a given image, generate its feature vector, then for one or more “features” generate images over the range of possible values for the chosen feature(s). Each dimension of the latent vector represents a normal distribution. The range of values of the dimension is controlled by the mean and variance of the dimension. All information we get from the encoder.

I am going to take the mean for the particular vector and generate a range of values between the mean minus three standard deviations and the mean plus three standard deviations.

Now I don’t expect in this case that this will generate any significantly different images. The mean, standard deviation and feature vector returned from the encoder are specific to one person’s image. They in no way define the full latent space. But, I thought it might be worth a look.

A couple of functions to do the heavy lifting.

  # generate a sample of feature values for single dimension of feature vector
  # using supplied distribution values
  def gen_z_dim(mu, std, nbr_smpl):
    d_lw, d_hg = mu - 3*std, mu + 3*std
    smpl = torch.linspace(d_lw, d_hg, nbr_smpl)
   
    return smpl


  def l_spc_traverse(decdr, mu, std, z_in, z_ndx, n_smpl=11):
    """
      decdr: vae decoder
      z_in: latent space vector for image to traverse
      z_ndx: if integer: index of feature to traverse
             if iterable: index of first feature and number of features to traverse
      n_smpl: nbr sample images to generate
        default to 11, makes 12 with original image
        makes for 2x6 or 3x4 image grid
    """
    if isinstance(z_ndx, int):
      t_ndx, t_sz = z_ndx, 1
    else:
      # [start ndx, number of features to modify]
      t_ndx, t_sz = z_ndx
    # Initialize a latent space vector with zeros
    z = z_in.repeat(n_smpl, 1).to(cfg.device)
    for i in range(t_sz):
      tmp_x = t_ndx + i
      z_vals = gen_z_dim(mu[0, tmp_x], std[0, tmp_x], n_smpl)
      # Assign the traversed values to the specified dimension
      z[:, tmp_x] = z_vals[:]

    # Decode the latent vectors
    with torch.no_grad():
        samples = decdr(z.to(cfg.device))
    return samples

And the code to generate a set of images (in a new if block of course). I am going to traverse the features at indexes 20-24.

  if do_traverse:
    # let's use the distributions for one image and traverse the lantent space for one or more features
    dd_img, _ = imgs[10]
    dd_img = dd_img.unsqueeze(0).to(cfg.device)
    with torch.no_grad():
      mu, std, z = vae.ncdr(dd_img)

    t_img = l_spc_traverse(vae.dcdr, mu, std, z, [20, 5], n_smpl=11)
    lst_img = torch.cat((dd_img, t_img), 0)
    # print(lst_img.shape)
    utl.image_grid(lst_img, 4, i_show=True, epoch=cfg.epochs, b_sz=12,
      img_cl='traverse latent space for one or more features of image', norm=False)

sample of images when traversing features 20-24

Personally, I don’t see any real change in the modified images.

Linear Interpolation

Okay, let’s, as done with an earlier GAN, take a start and end image then generate a series images interpolating between the two. We are essentially interpolating between two points in the trained latent space.

Once again a function to do the interpolation given the start and ending feature vectors and the number of steps to take between the two.

  def lnr_ntrpln(st, nd, stps):
    # Create a linear path from start to end
    # stps should be 2 less than total number of images to display
    mlts = torch.linspace(0, 1, stps).to(cfg.device)
    z = st
    for mlt in mlts[:, None]:
      with torch.no_grad():
        nt = (st * (1 - mlt)) + (nd * mlt)
        # print(mlt, nt.shape)
        z = torch.cat((z, nt), 0)
    z = torch.cat((z, nd), 0)
    # print(z.shape)s

    # Decode the samples along the path
    vae.eval()
    with torch.no_grad():
      smpls = vae.dcdr(z.to(cfg.device))
    return smpls

And, the if block using the function to generate the image grid.

  if do_spc_ntrpl:
    # interpolate between 2 dataset image latent space vectors
    mx_ndx = len(imgs)
    sx = cfg.rng.integers(low=0, high=mx_ndx, size=1)
    nx = cfg.rng.integers(low=0, high=mx_ndx, size=1)
    print(f"st: {sx}, nd: {nx}")
    i1 = imgs[sx[0]][0].unsqueeze(0).to(cfg.device)
    i2 = imgs[nx[0]][0].unsqueeze(0).to(cfg.device)
    with torch.no_grad():
      _, _, st = vae.ncdr(i1)
      _, _, nd = vae.ncdr(i2)
    ntrpl_smpl = lnr_ntrpln(st, nd, stps=22)
    # p_img = torch.cat((i1, ntrpl_smpl, i2), 0)
    utl.image_grid(ntrpl_smpl, 6, i_show=True, epoch=cfg.epochs, b_sz=24,
      img_cl='interpolate between 2 latent space vectors from dataset', norm=False)

And a couple of examples. The top left and bottom right are regenerated versions of the original two images for each interpolation.

interpolating between 2 randomly selected images — Interpolate between two images

Now let’s do that again using randomly generated start and end feature vectors.

  if do_rnd_ntrpl:
    # interpolate between 2 random latent space vectors
    st = torch.randn(1, z_dim).to(cfg.device)
    nd = torch.randn(1, z_dim).to(cfg.device)

    ntrpl_smpl = lnr_ntrpln(st, nd, stps=22)
    utl.image_grid(ntrpl_smpl, 6, i_show=True, epoch=cfg.epochs, b_sz=24,
      img_cl='interpolate between 2 random feature vectors', norm=False)

sample of images interpolating between two random feature vectors

Fini for Now

I was planning to cover one more way of playing around with the latent space vectors—simple arithmetic. But I have decided that this post is plenty long enough. A fair bit of code, a fair number of images, etc.

So, the arithmetic will have to wait until next time. For what will likely be a rather short post.

Do enjoy playing around in your latent space—whatever it may be. Or, in your dynamic space if that is more to your liking.

Too Old To Code

MCL with Pytorch: Variational Autoencoder, Part III