Okay time to move on. Let’s tackle larger and colour images. For this I will be attempting to code a variational autoencoder. Didn’t find many tutorials on autoecoders using colour images. The one I did find used the Labelled Faces in the Wild dataset. I am not sure I want to download another datatset of 13,000 plus images. So I am thinking I will try using the Glasses/No Glasses dataset I used for a previous project.

And, my 50 year old stats course isn’t quite up to really understanding the mathematics going on with this algorithm. But I am going to try to come to grips with at least a basic understanding of how this guy is different from the simple autoencoder we just completed.

Variational Autoencoder

Papers describing variational autoencoder (VAE) algorithms were published in 2013 by Kingma et al. and Rezende et al..

The previous basic autoencoder (AE) consisted of an encoder and a decoder. The encoder compressed the input into a dimensionally reduced latent representation. The decoder reconstructed the input for the latent representation. A variational autoencoder (VAE) also has an encoder and decoder. The VAE however works in a probabilistic fashion. The encoder generates parameters of a probability distribution (usually a Gaussain distribution) for the features in the latent space.

That is, the AE latent space is deterministic. Given the same input, the encoder will always generate the same point in the latent space. In a VAE, the encoder produces a distribution’s parameters. The latent representation is sampled from this distribution. And, perhaps more importantly, the VAE latent space is continuous. This allows for the generation of output that was not in the training dataset.

In both cases the loss function includes a reconstruction loss. For a simple AE that’s it. The VAE adds a second element, the Kullback-Leibler Divergence (KLD). The KLD draws the learned distribution in the latent space toward a prior distribution. As such it acts as a regularizer. Helping to keep the model from encoding too much information in the latent space.

The VAE loss is, apparently, an implementation of ELBO (evidence lower bound) algorithm.

The evidence lower bound is an important quantity at the core of a number of important algorithms used in statistical inference including expectation-maximization and variational inference.

The evidence lower bound (ELBO), Matthew N. Bernstein

Once last thing. VAEs usually use a Gaussian distribution in order to facilitate the reparameterization trick. Which in turn facilitates the backpropogation of gradients, allowing proper training.

For some distributions, it is possible to reparametrize samples in a clever way, such that the stochasticity is independent of the parameters. We want our samples to deterministically depend on the parameters of the distribution.

What is a variational autoencoder?, Jaan Li

There’s way more to it, but for now that’s as far as I am digging into it. Lot’s of resources on the web (one or two below). And, I certainly am not going to show you any of the equations. Nor explain/derive them—no can do that.

VAE Module

Okay, let’s get started. Create a new module, vae.py, add the basic start up code. A lot of this is repetition. That said here it is.

# vae.py: main module for coding and training a variational autoencoder
#         2025.01.04, rek

import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils
import torchvision
import torchvision.transforms as T
from tqdm import tqdm

import math, time
from time import localtime, strftime
import sys
# print(sys.path)
sys.path.append('../shared_mods')
# print(sys.path)

import config as cfg              # type: ignore
import utils as utl               # type: ignore
from logger import Logger         # type: ignore
from autoencoder import AE_Simple # type: ignore
import ae_utils

# debugging
# explicitly raise an error with a stack trace
torch.autograd.set_detect_anomaly(True)

# runtime control for this module
DEBUG = False
trn_model = False
tst_model = False
plt_losses = False
# test control booleans
tst_clp = False           # print out some debug data for command line parametes
tst_get_data = True       # check that our data loader is working correctly
tst_feats = False         # print info regarding the feature vectors produced by the encoder
tst_gen_mdl = True        # visualize the generative model (space)

# get command line parameters, update globals, create project sub-directories
cl_args = cfg.get_cl_args()
if tst_clp:
  print(cl_args)
  print(f"before updt -> run_nm : {cfg.run_nm}, epochs: {cfg.epochs}")
cfg.updt_cl_args(cl_args)
cfg.print_cl_args(cl_args)
if tst_clp:
  print(f"after updt ->  run_nm : {cfg.run_nm}, epochs: {cfg.epochs}")
cfg.mk_dirs()

# set seed for repeatability, but make sure use something different everytime
# training is resumed
if not cfg.resume:
  torch.manual_seed(cfg.pt_seed)
  np.random.seed(cfg.pt_seed)
else:
  t_seed = cfg.pt_seed + cfg.start_ep
  torch.manual_seed(t_seed)
  np.random.seed(t_seed)

A couple of comments. The # type: ignore after the imports from my shared module directory were needed to stop the Pylance extension from complaining that it couldn’t find the modules in the shared folder. That is due to my modifying the Python path to include that directory. And, for setting the random seed, I decided to modify my approach to ensure there is always a seed set. But a different one each time training is resumed.

Set Up DataLoader

Again pretty much repetition of past code, but… And quick test to make sure it is working.

# set up dataloader, using glasses dataset in proj 5
img_dir = Path("../proj5/data/glasses")
transform = T.Compose([T.Resize(256), T.ToTensor()])
imgs = torchvision.datasets.ImageFolder(
  root=img_dir, transform=transform
)
# if batch size not 16, fix
if cfg.batch_sz != 16:
  cfg.batch_sz = 16
ldr = torch.utils.data.DataLoader(imgs, batch_size=cfg.batch_sz, shuffle=True)

# print sample training images
if tst_get_data:
  a_btch = next(iter(ldr))
  print(len(a_btch))
  print(a_btch[0].shape, a_btch[1].shape)
  # note: a_btch[1] are labels
  print(f"a_btch[1]: {a_btch[1]}")
  # a_btch[0] are images of handwritten digits, 0-9
  utl.image_grid(a_btch[0], 8, i_show=True, epoch=0, b_sz=cfg.batch_sz, img_cl='A')
  # utl.image_grid(a_btch[1], 4, i_show=True, epoch=0, b_sz=cfg.batch_sz, img_cl='b')

Terminal ouput:

(mclp-3.12) PS F:\learn\mcl_pytorch\proj7> python vae.py -rn rek_3 -bs 16 -ep 5
 {'run_nm': 'rek_3', 'dataset_nm': 'no_nm', 'sv_img_cyc': 150, 'sv_chk_cyc': 50, 'resume': False, 'start_ep': 0, 'epochs': 5, 'batch_sz': 16, 'num_res_blks': 9, 'x_disc': 1, 'x_genr': 1, 'x_eps': 0, 'use_lrs': False, 'lrs_unit': 'batch', 'lrs_eps': 5, 'lrs_init': 0.01, 'lrs_steps': 25, 'lrs_wmup': 0}
image and checkpoint directories created: runs\rek_3_img & runs\rek_3_sv
2
torch.Size([16, 3, 256, 256]) torch.Size([16])
a_btch[1]: tensor([1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1])

And the sample images.

Sample glasses images

Code VAE

Okay, I am going to add the code for the VAE to the autoencoder.py module in the shared modules folder. I will be using three classes: one each for the encoder, the decoder and the VAE itself. I am not going to spend any time explaining any of the code, we have seen most of this code before. Other than to say, in the encoder the convolutional layers, because of the step size, downsample the images. The linear layers are used to generate the mean and standard deviation of the probabilistic vector. The the output features, z, are an instance sampled from the distribution. The decoder reverses the process and outputs an image generated from the probalistic feature vector.

I did also have to add an import or two, and the code to add the shared folders path to the Python path.

# autoencoder classes for various projects

import sys
import torch
import torch.nn.functional as F
from torch import nn

sys.path.append('../shared_mods')
import config as cfg              # type: ignore


class AE_Simple(nn.Module):
... ...

class VAEncoder(nn.Module):
  def __init__(self, l_dims=100):
    super().__init__()
    self.cnv1 = nn.Conv2d(3, 8, 3, stride=2, padding=1)
    self.cnv2 = nn.Conv2d(8, 16, 3, stride=2, padding=1)
    self.bn2 = nn.BatchNorm2d(16)
    self.cnv3 = nn.Conv2d(16, 32, 3, stride=2, padding=0)
    self.ln1 = nn.Linear(31*31*32, 1024)
    self.ln2 = nn.Linear(1024, l_dims)
    self.ln3 = nn.Linear(1024, l_dims)
    self.N = torch.distributions.Normal(0, 1)
    self.N.loc = self.N.loc.cuda()
    self.N.scale = self.N.scale.cuda()
    

  def forward(self, x):
    x = x.to(cfg.device)
    x = F.relu(self.cnv1(x))
    x = F.relu(self.bn2(self.cnv2(x)))
    x = F.relu(self.cnv3(x))
    x = torch.flatten(x, start_dim=1)
    x = F.relu(self.ln1(x))
    mu = self.ln2(x)
    std = torch.exp(self.ln3(x))
    z = mu + std*self.N.sample(mu.shape)
    return mu, std, z


class VADecoder(nn.Module):
  def __init__(self, l_dims=100):
    super().__init__()
    self.dc_ln = nn.Sequential(
      nn.Linear(l_dims, 1024),
      nn.ReLU(True),
      nn.Linear(1024, 31*31*32),
      nn.ReLU(True)
    )
    self.unfltn = nn.Unflatten(dim=1, unflattened_size=(32, 31, 31))
    self.dc_cnv = nn.Sequential(
      nn.ConvTranspose2d(32, 16, 3, stride=2, output_padding=1),
      nn.BatchNorm2d(16),
      nn.ReLU(True),
      nn.ConvTranspose2d(16, 8, 3, stride=2, padding=1, output_padding=1),
      nn.BatchNorm2d(8),
      nn.ReLU(True),
      nn.ConvTranspose2d(8, 3, 3, stride=2, padding=1, output_padding=1)
    )
  

  def forward(self, x):
    x = self.dc_ln(x)
    x = self.unfltn(x)
    x = self.dc_cnv(x)
    x = torch.sigmoid(x)
    return x


class VAE(nn.Module):
  def __init__(self, l_dims=100):
    super().__init__()
    self.ncdr = VAEncoder(l_dims)
    self.dcdr = VADecoder(l_dims)

  
  def forward(self, x):
    x = x.to(cfg.device)
    mu, std, z = self.encoder(x)
    return mu, std, self.dcdr(z)

And a bit of code to test the VAE classes.

if __name__ == "__main__":
  t_vae = VAE(100)
  print(t_vae)

After fixing a number of typos, missing commas and the like, I got the following in the terminal window.

(mclp-3.12) PS F:\learn\mcl_pytorch\proj7> cd ..\shared_mods
(mclp-3.12) PS F:\learn\mcl_pytorch\shared_mods> python autoencoder.py
VAE(
  (ncdr): VAEncoder(
    (cnv1): Conv2d(3, 8, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (cnv2): Conv2d(8, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (bn2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (cnv3): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2))
    (ln1): Linear(in_features=30752, out_features=1024, bias=True)
    (ln2): Linear(in_features=1024, out_features=100, bias=True)
    (ln3): Linear(in_features=1024, out_features=100, bias=True)
  )
  (dcdr): VADecoder(
    (dc_ln): Sequential(
      (0): Linear(in_features=100, out_features=1024, bias=True)
      (1): ReLU(inplace=True)
      (2): Linear(in_features=1024, out_features=30752, bias=True)
      (3): ReLU(inplace=True)
    )
    (unfltn): Unflatten(dim=1, unflattened_size=(32, 31, 31))
    (dc_cnv): Sequential(
      (0): ConvTranspose2d(32, 15, kernel_size=(3, 3), stride=(2, 2), output_padding=(1, 1))
      (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): ConvTranspose2d(16, 8, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
      (4): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): ConvTranspose2d(8, 3, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
    )
  )
)

Done

A concept or two and lots of code to sort through. Therefore I think that’s enough for this post, so you can do just that.

Until next time, be happy playing with your Autoencoder.

Resources