What I plan to do next is run the test dataset (10,000 items) through the encoder. Collect the encoder’s feature vectors and related labels. Reduce the feature vectors to 2 dimensions. And use those bits and pieces to plot the latent space produced by the encoder.

I am going to start with only a few hundred feature vectors. The algorithm to reduce the latent space data to 2 dimensions is apparantly cpu and memory demanding for large datasets. More on that in a bit.

Feature Vectors

Let’s have a quick look at the feature vectors output by the encoder. And we don’t need a few hundred for that.

... ...
tst_feats = True          # print info regarding the feature vectors produced by the encoder
... ...
if tst_feats:
  # instantiate model for jitscript file
  fl_pth = cfg.sv_dir/"AE_Simple_jitscript_32_4.pt"
  aec = torch.jit.load(fl_pth).to(cfg.device)
  aec.eval()
  print("\n", aec)

  # get set of images from test set
  dl_iter = iter(tst_ldr)
  tst_img, tst_lbl = next(dl_iter)
  tst_img = tst_img.reshape(-1, 28*28).to(cfg.device)
  # get output of autoencoder
  with torch.no_grad():
    # only need the encodings, no need to run the decoder at this time
    ls = aec.encoder(tst_img)

  print(ls.shape, ls[0].shape, type(ls))
  for i in range(5):
    print(f"{tst_lbl[i]}: {ls[i]}")

And that output the following.

(mclp-3.12) PS F:\learn\mcl_pytorch\proj7> python autoe.py -rn rek_2 -bs 32 -ep 5
 {'run_nm': 'rek_2', 'dataset_nm': 'no_nm', 'sv_img_cyc': 150, 'sv_chk_cyc': 50, 'resume': False, 'start_ep': 0, 'epochs': 5, 'batch_sz': 32, 'num_res_blks': 9, 'x_disc': 1, 'x_genr': 1, 'x_eps': 0, 'use_lrs': False, 'lrs_unit': 'batch', 'lrs_eps': 5, 'lrs_init': 0.01, 'lrs_steps': 25, 'lrs_wmup': 0}
image and checkpoint directories created: runs\rek_2_img & runs\rek_2_sv

 RecursiveScriptModule(
  original_name=AE_Simple
  (e_init): RecursiveScriptModule(original_name=Linear)
  (encoded): RecursiveScriptModule(original_name=Linear)
  (d_init): RecursiveScriptModule(original_name=Linear)
  (decode): RecursiveScriptModule(original_name=Linear)
)
torch.Size([32, 16]) torch.Size([16]) <class 'torch.Tensor'>
1: tensor([ 1.0755, -5.8183,  6.7693, -7.7218, 11.9776,  2.7576, -1.0790, -1.6250,
         4.9702,  1.4125,  5.9411, -4.6164, -5.9677,  5.5891, -3.1015,  0.3135],
       device='cuda:1')
9: tensor([-1.0786, -0.9990, -0.4556,  2.0526, -4.7104, -5.7177,  2.7462,  2.0642,
         2.2918,  1.4768,  0.4400,  3.7596,  3.9020,  1.5404,  2.2318,  7.7456],
       device='cuda:1')
0: tensor([ 6.7747, -1.9898, -2.9543, -3.9715, -2.8956, -1.3965,  5.1311,  4.7521,
        -5.3602,  3.9081, -1.9221,  1.5446, -9.5822,  1.2679,  2.6260, 11.0705],
       device='cuda:1')
8: tensor([ 7.5880,  3.9347, -1.3316, -7.9841,  1.2560, -1.6227, -2.0465,  4.4854,
         3.1442, -5.4096,  0.3417,  1.8990,  1.1080, -6.6186, -3.4240,  6.8991],
       device='cuda:1')
7: tensor([  7.6755,   7.1142,   9.0879, -14.2383,   8.5561,  -6.5581,   2.3631,
         -3.2144,   1.8247,   4.8763,   3.7434,   6.0295,  -2.2372,   4.5577,
         -6.1951,   8.5715], device='cuda:1')

t-SNE

There are a few options for reducing the dimensionality of the feature vectors. The one I am going to use is a stochastic algorithm, t-distributed Stochastic Neighbor Embedding (t-SNE).

is a statistical method for visualizing high-dimensional data by giving each datapoint a location in a two or three-dimensional map. It is based on Stochastic Neighbor Embedding originally developed by Geoffrey Hinton and Sam Roweis, where Laurens van der Maaten and Hinton proposed the t-distributed variant. It is a nonlinear dimensionality reduction technique for embedding high-dimensional data for visualization in a low-dimensional space of two or three dimensions.

Wikipedia

I will be using the scikit-learn implementation of the algorithm. The original paper was a little beyond anything I was prepared to tackle at this point in my life.

I started by creating a new module in the same directory as the project, proj7\ae_utils.py. And added the first import and function code, as follows.

# ae_utils.py
# this module will provide functions/classes specific to this project,
# so I am not putting it in the shared modules folder

from sklearn import manifold

def calc_2D_proj_lspace(img_ncdg, rand_st=0):
  # Compute t-SNE embedding of latent space
  print("Computing t-SNE embedding...")
  tsne = manifold.TSNE(n_components=2, init='pca', random_state=rand_st)
  x_tsne = tsne.fit_transform(img_ncdg)
  return x_tsne

I decided to give it a go with the a single batch of 50 images. Also decided to plot what I got back from the new utility function. Took a bit of time to sort out how to handle the labels. matplotlib docs definitely gave me the answer.

Added an import or two, and the following to the main module after the feature checking code above.

  st_tm = time.perf_counter()
  x_tsne = ae_utils.calc_2D_proj_lspace(ls.cpu(), rand_st=cfg.pt_seed)
  nd_tm = time.perf_counter()
  print(f"encoding and dim reduction took: {nd_tm - st_tm}")
  print(x_tsne.shape, x_tsne[0].shape, type(x_tsne))
  for i in range(5):
    print(f"{tst_lbl[i]}: {x_tsne[i]}")

  fig, ax = plt.subplots(figsize=(8, 8))
  scatter = ax.scatter(x=x_tsne[:, 0], y=x_tsne[:, 1], s=20.0, 
              c=tst_lbl, cmap='tab10', alpha=0.9, zorder=2)
  # produce a legend with the unique colors from the scatter
  legend1 = ax.legend(*scatter.legend_elements(),
                      loc="upper right", title="Digits")
  ax.add_artist(legend1)
  ax.spines["right"].set_visible(False)
  ax.spines["top"].set_visible(False)  

  plt.show()

And here’s the t-SNE specific output in the terminal. I won’t bother with the image that was generated in this test. We will be generating one with much more data.

Computing t-SNE embedding...
(50, 2) (2,) <class 'numpy.ndarray'>
3: [-8.413566   1.0142158]
9: [-6.1550326 -1.0951712]
3: [-9.367345   2.0370538]
2: [-7.2053614  1.5933691]
0: [-9.911617    0.02265216]
encoding and dim reduction took: 0.660329099977389

Latent Space

Okay, let’s get on to plotting the latent space with a decent amount of data. I am going to refactor the code that encodes a single batch to encode the whole test dataset (10,000 images). Then get the 2D version of the latent space and plot it. That latter code will need a few minor refactorings as well. Primarily variable name changes.

Had some issues with sending a Python list to the sklearn.manifold.TSNE.fit_transform method. l_spc was a Python list of tensors. So, I ended up converting it to a tensor of tensors which seemed to work just fine. And more refactoring than I initially expected. Here’s the complete refactored code for this particular if block.

I also decided to save labels, latent space and 2D reduced space in case I wanted to play around some more down the road. Figured it would save me from recreating all the data if I did so.

if tst_feats:
  # instantiate model for jitscript file
  fl_pth = cfg.sv_dir/"AE_Simple_jitscript_32_4.pt"
  aec = torch.jit.load(fl_pth).to(cfg.device)
  aec.eval()
  print("\n", aec)

  st_tm = time.perf_counter()
  l_spc, l_lbl = [], []
  for (img, lbl) in tqdm(trn_ldr, desc=f"test images"):
    # Reshaping the image to (-1, 784)
    img = img.reshape(-1, 28*28).to(cfg.device)
    # get output of autoencoder
    with torch.no_grad():
      # only need the encodings, no need to run the decoder at this time
      ls = aec.encoder(img)
    l_spc.extend(ls.cpu())
    l_lbl.extend(lbl)

  l_spc = torch.stack(l_spc, dim=0)
  print(len(l_spc), len(l_spc[0]), type(l_spc))
  for i in range(5):
    print(f"{lbl[i]}: {l_spc[i]}")
  
  x_tsne = ae_utils.calc_2D_proj_lspace(l_spc, rand_st=cfg.pt_seed)
  nd_tm = time.perf_counter()
  print(len(x_tsne), len(x_tsne[0]), type(x_tsne))
  for i in range(5):
    print(f"{l_lbl[i]}: {x_tsne[i]}")
  print(f"encoding and dim reduction took: {nd_tm - st_tm}")

  # save data for later use without needing to regenerate
  fl_nm = f"lspc_{cfg.epoch}.pt"
  fl_pth = cfg.sv_dir/fl_nm
  torch.save({
    'run_nm': cfg.run_nm,
    'epoch': cfg.epoch,
    'labels': l_lbl,
    'latent_spc': l_spc,
    'tsne': x_tsne,
    }, fl_pth)

  fig, ax = plt.subplots(figsize=(8, 8))
  scatter = ax.scatter(x=x_tsne[:, 0], y=x_tsne[:, 1], s=20.0, 
              c=l_lbl, cmap='tab10', alpha=0.9, zorder=2)
  # produce a legend with the unique colors from the scatter
  legend1 = ax.legend(*scatter.legend_elements(),
                      loc="upper right", title="Digits")
  ax.add_artist(legend1)
  ax.spines["right"].set_visible(False)
  ax.spines["top"].set_visible(False)  

  plt.show()
(mclp-3.12) PS F:\learn\mcl_pytorch\proj7> python autoe.py -rn rek_2 -bs 50 -ep 5
 {'run_nm': 'rek_2', 'dataset_nm': 'no_nm', 'sv_img_cyc': 150, 'sv_chk_cyc': 50, 'resume': False, 'start_ep': 0, 'epochs': 5, 'batch_sz': 50, 'num_res_blks': 9, 'x_disc': 1, 'x_genr': 1, 'x_eps': 0, 'use_lrs': False, 'lrs_unit': 'batch', 'lrs_eps': 5, 'lrs_init': 0.01, 'lrs_steps': 25, 'lrs_wmup': 0}
image and checkpoint directories created: runs\rek_2_img & runs\rek_2_sv

 RecursiveScriptModule(
  original_name=AE_Simple
  (e_init): RecursiveScriptModule(original_name=Linear)
  (encoded): RecursiveScriptModule(original_name=Linear)
  (d_init): RecursiveScriptModule(original_name=Linear)
  (decode): RecursiveScriptModule(original_name=Linear)
)
test images: 100%|████████████████████████████████████████████████████████████████| 1200/1200 [00:05<00:00, 208.79it/s]
60000 16 <class 'torch.Tensor'>
7: tensor([  6.3097,   3.7142,  -4.3221,  -3.4887,  -3.4655,  -1.6347,   6.1345,
         -2.7876,   2.5459,  -7.9927,  -5.0332,  -5.0763,  -7.9325, -10.0664,
        -14.7241,  -0.7667])
3: tensor([ -8.2752,  -4.2803,  -4.4738,   6.9895,  -3.2540,   3.5098,   8.0703,
          2.3882,   1.0221,  -9.6667,   1.5837,  -1.6203,  -5.6337,  -0.9226,
        -12.8357,   4.1507])
5: tensor([13.8660, -8.6649, -1.6775, -5.2253, -1.0091, -2.6771,  6.5598,  1.3414,
         0.7885,  0.5504, -2.1562, -4.0960, -1.5405, -1.4043, -4.3700,  0.8274])
0: tensor([ 3.9373,  7.9250, -1.0446, -7.3469,  1.6585, -4.8047,  1.1448,  1.8634,
        -3.2952, -7.1976,  0.7006,  7.7023, -5.3275,  1.0801,  1.6673, -2.9311])
3: tensor([ 8.3994, -3.9456,  4.3399,  5.0077,  3.0452, -3.3244, 11.6737, 13.2087,
         0.1622, -7.1432, -1.5658, -6.4202,  7.8822,  1.6679,  1.2957,  8.1367])
Computing t-SNE embedding...
60000 2 <class 'numpy.ndarray'>
3: [32.380585   4.0102367]
2: [-31.243042 -78.767365]
3: [17.931267 17.96691 ]
7: [-84.87595  51.51109]
6: [ 48.989887 -45.33404 ]
encoding and dim reduction took: 141.5087877000915

Seemed a lot longer than ~142 seconds. CPU was running at 60-70% utilization for the whole time t-SNE was doing its thing. And here’s the resulting image.

plot of latent space of encoded training dataset

You may have noticed I had a bug somewhere. The above was actually using the training dataset (60,000 images), not the test dataset (10,000 images). I forgot to change the loader variable name after a copy and paste. So the training and test loaders were both using the same dataset.

When I actually use the test dataset, the terminal output is as follows.

(mclp-3.12) PS F:\learn\mcl_pytorch\proj7> python autoe.py -rn rek_2 -bs 50 -ep 5
 {'run_nm': 'rek_2', 'dataset_nm': 'no_nm', 'sv_img_cyc': 150, 'sv_chk_cyc': 50, 'resume': False, 'start_ep': 0, 'epochs': 5, 'batch_sz': 50, 'num_res_blks': 9, 'x_disc': 1, 'x_genr': 1, 'x_eps': 0, 'use_lrs': False, 'lrs_unit': 'batch', 'lrs_eps': 5, 'lrs_init': 0.01, 'lrs_steps': 25, 'lrs_wmup': 0}
image and checkpoint directories created: runs\rek_2_img & runs\rek_2_sv

 RecursiveScriptModule(
  original_name=AE_Simple
  (e_init): RecursiveScriptModule(original_name=Linear)
  (encoded): RecursiveScriptModule(original_name=Linear)
  (d_init): RecursiveScriptModule(original_name=Linear)
  (decode): RecursiveScriptModule(original_name=Linear)
)
test images: 100%|██████████████████████████████████████████████████████████████████| 200/200 [00:01<00:00, 193.66it/s]
10000 16 <class 'torch.Tensor'>
4: tensor([  6.8437,  -0.2392,  -6.4106,  -6.5359,  -3.3511,  -0.6762,   2.3043,
          6.7140,  -1.8107,  -0.8209,   7.6290, -10.5628,  -0.5726,  -1.1218,
         -3.6993,  -1.3324])
1: tensor([ 1.6882, -1.2119,  6.3842, -5.3774, -8.3177,  0.1306, -0.3173,  2.0797,
        -3.3501, -1.9098, -5.2648, -3.2275, -1.6296, -4.1602, -3.6043,  2.2425])
3: tensor([ 4.1503,  0.0935,  4.7474, -5.0308, -0.8842, -0.9016,  6.7219, -2.8780,
        -3.0098,  2.5089, -2.1332, -1.8435,  2.4272, -6.6226, -8.6928, 17.3955])
9: tensor([  4.7525,   0.6668,   7.3750, -11.6276,   2.9127,  -4.9804,  -3.9553,
         -0.3760,  -2.4306,  -1.8331,   5.8237,   2.7078,  -1.3671,  10.7092,
         -1.7911,   7.9150])
6: tensor([ 2.2807, -5.7461,  3.5298,  2.3213,  3.1310,  0.7262, -1.3026,  2.1894,
         5.2384, -0.4590,  5.2855, -3.0508, -7.4724, -4.9697,  3.8221,  4.8504])
Computing t-SNE embedding...
10000 2 <class 'numpy.ndarray'>
8: [-7.259082 -2.751589]
3: [ 43.04027 -15.68169]
5: [ 27.363228 -22.855804]
7: [-44.232334  34.251896]
6: [52.72914   -3.4064693]
encoding and dim reduction took: 20.842351299943402

And the generated image is much sparser.

plot of latent space of encoded test dataset

But regardless of which image you look at, it is pretty clearly each digit has it’s own region in the latent space. But given the number of miscoloured areas in each of those regions, I expect my 5 epochs of training were really not quite enough. Of course a 4 looks a bit like a 9, a 7 like a 1, etc. So perhaps a few images in the wrong area of the latent space should be expected. Especially given handwritten digits.

That said, still a good visualization of what the autoencoder is doing during the encoder stage. The decoder must learn what digit a given value of the latent space must produce. Along with some variations in the digits appearance. Slope, completeness, finishing elements, etc.

Done

You know, I am feeling rather good about getting this bit of experimentation to work. To work as expected as well. As such, I think I am going to enjoy the moment and call this post done, done, done.

May your experimental efforts bring you as much joy.

Resources