As mentioned last post, I am going to take a break from trying to get the CycleGAN training to successfully converge. I really need some time away so that, hopefully, my frustation stops interfering with my brain’s functionality. And also as mentioned last post, I figured the next thing to look at are autoencoders.

The ultimate goal is to code a variational autoencoder (VAE). What I am trying to decide is whether or not to code a plain vanilla autoencoder first so that I get some basic idea of how they work. And consequently what the advantages of a VAE might be. And, you know, I think that’s just what we will do. Most of the web tutorials used the base MNIST dataset. I am going to do the same. Thought I did think about using the clothing dataset we used in previous projects. But I expect that would require considerably more training.

Autoencoder Basics

Essentially autoencoders are models that learn to create effecient, generally compressed, versions of the input data without any labels. An encoder learns to generate that compressed version of its input and a decoder learns to recreate the original input using the compressed data as its input. The output of the decoder will not be an exact replica of the input used to generate the compressed version—it is, afterall, lossy compression. This reduced-dimensional representation is referred to as the latent space. Within the model this representation is captured in the bottleneck layer.

There are a number of different types of autoencoders. I will leave it to you to sort through the available tutorials and such. For my projects, I am going to start with an undercomplete autoencoder (a.k.a. vanilla autoencoder). Then I will look at coding a variational autoencoder.

For our vanilla autoencoder, the loss function is usually a reconstruction loss. I.E. a measurement of the difference between the input to the encoder and the decoder’s reconstructed output. During training the autoencoder learns to minimize this loss. Thereby learning to capture the most important features of the input in the bottleneck layer.

Let’s Get Coding

New project directory, proj7. Will copy over the config, utils, logger and find_bugs modules from the previous project. Should probably put them somewhere where all projects could access them rather than copying them into each new project. You know, let’s give it a try.

At the top directory, \learn\mcl_pytorch\ I added an empty __init__.py file. Ditto for project directory, \learn\mcl_pytorch\proj7. I then created a directory for the shared modules, \learn\mcl_pytorch\shared_mods. Added an empty __init__.py file. Copied over the modules mentioned above.

In order for this to work, I have make sure that the shared module directory is in my Python path. So, I will add the following to any file using those modules before I import them.

import sys
sys.path.append('../shared_mods')

Let’s give it a quick test.

import sys
sys.path.append('../shared_mods')

import config as cfg
import utils as utl
from logger import Logger

tst_clp = True

# get command line parameters, update globals, create project sub-directories
cl_args = cfg.get_cl_args()
if tst_clp:
  print(cl_args)
  print(f"before updt -> run_nm : {cfg.run_nm}, epochs: {cfg.epochs}")
cfg.updt_cl_args(cl_args)
cfg.print_cl_args(cl_args)
if tst_clp:
  print(f"after updt ->  run_nm : {cfg.run_nm}, epochs: {cfg.epochs}")
cfg.mk_dirs()
(mclp-3.12) PS F:\learn\mcl_pytorch\proj7> python autoe.py -rn rek_1
{'run_nm': 'rek_1', 'dataset_nm': None, 'sv_img_cyc': None, 'sv_chk_cyc': None, 'resume': False, 'start_ep': None, 'epochs': None, 'batch_sz': None, 'image_sz': None, 'num_res_blks': None, 'x_disc': None, 'x_genr': None, 'x_eps': None, 'use_lrs': False, 'lrs_unit': None, 'lrs_eps': None, 'lrs_init': None, 'lrs_steps': None, 'lrs_wmup': None}
before updt -> run_nm : no_nm, epochs: 5
 {'run_nm': 'rek_1', 'dataset_nm': 'no_nm', 'sv_img_cyc': 150, 'sv_chk_cyc': 50, 'resume': False, 'start_ep': 0, 'epochs': 5, 'batch_sz': 4, 'num_res_blks': 9, 'x_disc': 1, 'x_genr': 1, 'x_eps': 0, 'use_lrs': False, 'lrs_unit': 'batch', 'lrs_eps': 5, 'lrs_init': 0.01, 'lrs_steps': 25, 'lrs_wmup': 0}
after updt ->  run_nm : rek_1, epochs: 5
image and checkpoint directories created: runs\rek_1_img & runs\rek_1_sv

While working on the above I realized that a number of those modules had tests that required some CycleGAN modules (datasets and models). So I copied them into the shared_mods folder, renamed them (cycg_dsets and cycg_models) and refactored the other modules’ imports and test code accordingly.

MNIST Dataset

I have not yet used this dataset in any of my previous projects. So, I will be downloading it to a data subdirectory under the project directory, proj7, before setting up the dataloader. Fortunately we only need to do that once. Though it certainly isn’t as big as some of the other datasets. (Note: I recently enlarged my project partition, doubled it to 200 GB. All those images I was saving during training runs were chewing up a lot of disk space. I will eventually delete most, if not all, of them.)

The code below will download the data if it is not already in the specified folder. Which at present it is not. You can probably guess where the code above goes.

import torch
import torchvision
import torchvision.transforms as T

... ...
# set seed for repeatability, but not if resuming training on existing model
if not cfg.resume:
  torch.manual_seed(cfg.pt_seed)
  np.random.seed(cfg.pt_seed)

# download, if necessary, MNIST training and testing datasets, convert to tensors with values 0 to 1
transform = T.Compose([
    T.ToTensor()
])
trn_set=torchvision.datasets.MNIST(root="./data", train=True, download=True, transform=transform)
tst_set=torchvision.datasets.MNIST(root="./data", train=False, download=True, transform=transform)

# create dataloaders for both sets
trn_ldr = torch.utils.data.DataLoader(trn_set, batch_size=cfg.batch_sz, shuffle=True)
tst_ldr = torch.utils.data.DataLoader(tst_set, batch_size=cfg.batch_sz, shuffle=True)

And, in the terminal I got the following.

(mclp-3.12) PS F:\learn\mcl_pytorch\proj7> python autoe.py -rn rek_1 -bs 8
 {'run_nm': 'rek_1', 'dataset_nm': 'no_nm', 'sv_img_cyc': 150, 'sv_chk_cyc': 50, 'resume': False, 'start_ep': 0, 'epochs': 5, 'batch_sz': 8, 'num_res_blks': 9, 'x_disc': 1, 'x_genr': 1, 'x_eps': 0, 'use_lrs': False, 'lrs_unit': 'batch', 'lrs_eps': 5, 'lrs_init': 0.01, 'lrs_steps': 25, 'lrs_wmup': 0}
image and checkpoint directories created: runs\rek_1_img & runs\rek_1_sv

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data\MNIST\raw\train-images-idx3-ubyte.gz
100%|███████████████████████████████████████████████████████████████████| 9912422/9912422 [00:01<00:00, 8131969.63it/s]
Extracting ./data\MNIST\raw\train-images-idx3-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data\MNIST\raw\train-labels-idx1-ubyte.gz
100%|████████████████████████████████████████████████████████████████████████| 28881/28881 [00:00<00:00, 434139.21it/s]
Extracting ./data\MNIST\raw\train-labels-idx1-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data\MNIST\raw\t10k-images-idx3-ubyte.gz
100%|███████████████████████████████████████████████████████████████████| 1648877/1648877 [00:00<00:00, 3514752.89it/s]
Extracting ./data\MNIST\raw\t10k-images-idx3-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz
100%|██████████████████████████████████████████████████████████████████████████████████████| 4542/4542 [00:00<?, ?it/s]
Extracting ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw

What I found interesting is that data is compressed on download. And stays compressed. The Dataloader class does the decompression when loading the datasets.

And no, I didn’t sort the 403 errors as the data (images and labels) seems to load just fine.

And a little code to show a sample of the digits in the training set.

... ...

tst_get_data = True

... ...

# print sample training images
if tst_get_data:
  a_btch = next(iter(trn_ldr))
  print(len(a_btch))
  print(a_btch[0].shape, a_btch[1].shape)
  # note: a_btch[1] are the labels for the images in the batch
  print(f"a_btch[1]: {a_btch[1]}")
  # a_btch[0] are images of handwritten digits, 0-9
  utl.image_grid(a_btch[0], 8, i_show=True, epoch=0, b_sz=cfg.batch_sz, img_cl='A')

And, the result.

(mclp-3.12) PS F:\learn\mcl_pytorch\proj7> python autoe.py -rn rek_1 -bs 8
 {'run_nm': 'rek_1', 'dataset_nm': 'no_nm', 'sv_img_cyc': 150, 'sv_chk_cyc': 50, 'resume': False, 'start_ep': 0, 'epochs': 5, 'batch_sz': 8, 'num_res_blks': 9, 'x_disc': 1, 'x_genr': 1, 'x_eps': 0, 'use_lrs': False, 'lrs_unit': 'batch', 'lrs_eps': 5, 'lrs_init': 0.01, 'lrs_steps': 25, 'lrs_wmup': 0}
image and checkpoint directories created: runs\rek_1_img & runs\rek_1_sv
2
torch.Size([8, 1, 28, 28]) torch.Size([8])
a_btch[1]: tensor([0, 9, 0, 4, 9, 9, 0, 3])
Sample of Digits in MNIST Training Set
sample of digits in MNIST training set

Autoencoder Class

This is going to be a very simple autoencoder. Two layers for both the encoder and decoder. For the encoder, I will roughly cut the input in half, then down to a small number of features—the output of the encoder. The decoder, taking the encoder output as its input, will reverse the process. The decoder will use the sigmoid method to convert the final layer’s data to something suitable for an image. Many of the examples I looked at used a single sequence for the whole autoencoder. But, I also want to get the ouput from the encoder.

I am returning two values from the forward method. The latent space data from the encoder and the images from the decoder. I am hoping to plot the latent space data to show how the digits are represented in that space. But need to sort out a few things before I can generate a 2D plot. Each of the latent space values is a tensor with 16 values. Tough to plot that.

Oh yes, new module in the shared folder, autoencoder.py. The module currently looks as follows. It will eventually also hold the class/code for the variational autoencoder.

# autoencoder classes for various projects

import torch
import torch.nn.functional as F
from torch import nn

class AE_Simple(nn.Module):
    # d_inp = encoder input dimension, d_mid = encoder intermediate dimension, d_out = encoder output dimension
    # use reverse for decoder
    def __init__(self,d_inp,d_mid,d_out):
        super().__init__()
        self.e_init = nn.Linear(d_inp, d_mid)
        self.encoded = nn.Linear(d_mid, d_out)
        self.d_init = nn.Linear(d_out, d_mid)
        self.decode = nn.Linear(d_mid, d_inp)                
    def encoder(self, x):
        # encode input return latent space vectors
        e_in = F.relu(self.e_init(x))
        ls = self.encoded(e_in)
        return ls
    def decoder(self, z):
        # decode latent space vectors
        out=F.relu(self.d_init(z))
        out=torch.sigmoid(self.decode(out))
        return out
    def forward(self, x):
        ls=self.encoder(x)
        out=self.decoder(ls)
        return out, ls

Not exactly a general and reusable class. May think about refactoring it in some fashion.

And a quick test to make sure the code is good. I want to keep my feature sizes as powers of 2 (just cuz).

... ...
import math
... ...
# instantiate autoencoder
img_wd, img_ht = trn_set[0][0].shape[1], trn_set[0][0].shape[2]
sz_inp = img_wd * img_ht
isz_log2 = int(math.log2(sz_inp))
ls_mid = int(2**(isz_log2 - 1))
ls_fin = 16
aec = AE_Simple(sz_inp, ls_mid, ls_fin).to(cfg.device)
print(f"\n{aec}")

And the ouput from the terminal window.

(mclp-3.12) PS F:\learn\mcl_pytorch\proj7> python autoe.py -rn rek_1 -bs 32
 {'run_nm': 'rek_1', 'dataset_nm': 'no_nm', 'sv_img_cyc': 150, 'sv_chk_cyc': 50, 'resume': False, 'start_ep': 0, 'epochs': 5, 'batch_sz': 32, 'num_res_blks': 9, 'x_disc': 1, 'x_genr': 1, 'x_eps': 0, 'use_lrs': False, 'lrs_unit': 'batch', 'lrs_eps': 5, 'lrs_init': 0.01, 'lrs_steps': 25, 'lrs_wmup': 0}
image and checkpoint directories created: runs\rek_1_img & runs\rek_1_sv

AE_Simple(
  (e_init): Linear(in_features=784, out_features=256, bias=True)
  (encoded): Linear(in_features=256, out_features=16, bias=True)
  (d_init): Linear(in_features=16, out_features=256, bias=True)
  (decode): Linear(in_features=256, out_features=784, bias=True)
)

Done

And before I move on to coding the training loop and such I think I am going to call this post finished. I favour small steps.

Until next time, may your steps, of whatever size, get you to your goals.

Resources