Okay, I am going to move on to the next level. The dataset is the Glasses/No Glasses dataset. It contains colour images of people (male and female of differing ethnicities) either wearing glasses or not wearing glasses. We are going to code a cGAN to produce images of people with glasses or without classes. But these are of a higher resolution than any of our previous images. Though we will be reducing their size (from 1024 x 1024) for the purposes of this project.

But that does present possible problems we haven’t seen with our GANs todate. GANs often suffer from problems such as mode collapse, vanishing or exploding gradients and unstable training. Even with our previous models, if we ran them for large numbers of epochs we might have run into the stability issue. Very likely at high numbers of epochs, the images would in fact start getting worse rather than better.

So we are going to use a new loss function which uses the Wasserstein-1 distance for its cost measure. This significantly smooths the gradient flow and produces more stability in training. However, it does have issues of its own unless the critic is 1-Lipschitz continuous. That is the gradient norms of the critic’s function must be at most 1 everywhere. There are different approaches to ensuring this constraint is met. The one I will be using is gradient penalty. This appears to be the current preferred approach.

All of this is going to involve additional code over and above our previous cGAN code modules. And an algorithmic twist or two.

Pre-processing Data

The zip for the Glasses or No Glasses dataset is rather large (7 GB or so). It took something on the order of 15-20 minutes to download. And 2-3 minutes just to unzip. It consists of a directory of 5,000 images and two CSV files, train.csv and test.csv. We will only be dealing with the training set which has 4500 entries in the appropriate CSV file.

The interesting thing about this dataset is that it does not contain real images of people with or without glasses. All of the images were generated by a GAN.

Split Images Based on Presence of Glasses

I will be using the torchvision.datasets.ImageFolder method to load the dataset from disk. This method assumes that any sub-directories in the root data directory represent classes. When loading the dataset it will, based on the sub-directory the image was in, assign a numeric class value to each image.

To facilitate classifying the images, I will be be copying them into a new directory, ./data/glasses. I will be moving each image to a further sub-directory depending on whether or not the person in the image is wearing glasses. I will label those directories as G and NoG. The image will be moved to the appropriate directory based on the glasses field in the CSV file. A value of 0 means no glasses and 1 the opposite.

So, I wrote some code to take care of that bit of pre-processing. All the pre-processing code in is a separate Python module. I didn’t want it cluttering up the module with the code for the WGAN. I will get into the need for the do_fixes variable shortly.

The CSV file only has an id value for each image. I.E. 1, 2, …, 4500. The file names are in the format face-{id}.png

# g_ng_data.py
# Ver 0.1.0: 2024.05.19, rek, pre-process glasses/no glasses images
# - split into two directories to allow use for training cGAn
#   downloaded zip (10 min), unzipped (2-3 min), all images in one directory
#   only want to run this code once, so not including in cgan module
# - images with glasses will go in .\data\glasses\G and
#   without in .\data\glasses\NoG
# - only doing the training set, use train.csv to sort with and without

import pandas as pd
from pathlib import Path
import shutil

train = pd.read_csv('data/train.csv')

do_fst_mv = False
do_fixes = True

G = Path("data/glasses/G/")
NoG = Path("data/glasses/NoG/")
G.mkdir(parents=True, exist_ok=True)
NoG.mkdir(parents=True, exist_ok=True)

if do_fst_mv:
  train.set_index('id', inplace=True)
  n_items = len(train)
  print(f"number of entries in train.csv: {n_items}")

  src = Path("data/faces-spring-2020/faces-spring-2020/")
  for i in range(1, n_items + 1):
    fl_nm = Path(f"face-{i}.png")
    s_pth = src / fl_nm
    if train.loc[i]['glasses'] == 0:
      d_pth = NoG / fl_nm
    else:
      d_pth = G / fl_nm
    shutil.copy(s_pth, d_pth)

Needless to say that took a minute or two. Since I am using a SSD on my machine learning pc, probably faster than it would have been on my other PC where it would have been working with an HDD.

Fix Issues with the Dataset

Unfortunately the CSV file has errors in it. A fair number of the images have been misclassified. I was going to just manually transfer them to the correct directory. But in the end I created a list (dictionary) of the mis-classified images. A dictionary, g_ng_wrong, keyed on image id with a value indicating whether it has glasses (1) or not (0).

Turns out there were also some images that only had bits and pieces of glasses in them. I decided to not use those. So another list. This one actually a list (not a dictionary), bad_img, of image ids.

I will not be including those lists in the post. I don’t want to rob you of all the entertainment of creating them. Took me around two hours to produce them. But here’s the code that goes about moving and or deleting the affected image files.

if do_fixes:
  nf_G = len(list(G.glob('*')))
  nf_NoG = len(list(NoG.glob('*')))
  print(f"\nnumber files: G {nf_G}, NoG {nf_NoG}, total {nf_G + nf_NoG}")

  cat_cnt = train.glasses.value_counts()
  mv_2_g = sum(g_ng_wrong.values())
  mv_2_ng = len(g_ng_wrong) - mv_2_g
  print(f"len train dataset: {len(train)}, len moves: {len(g_ng_wrong)}, len bad: {len(bad_img)}")
  print(f"mv g 2 ng: {mv_2_ng}, mv ng 2 g: {mv_2_g}")
  good_cnt = len(train) - len(bad_img)
  print(f"cnt glasses: {cat_cnt.loc[1] - mv_2_ng + mv_2_g - len(bad_img)}; cnt no-glasses: {cat_cnt.loc[0] - mv_2_g + mv_2_ng}; tot good img: {good_cnt}")
  print(f"batches: { good_cnt / 16}")

  for i in bad_img:
    fl_nm = Path(f"face-{i}.png")
    if train.loc[i]['glasses'] == 0:
      fl_pth = NoG / fl_nm
    else:
      fl_pth = G / fl_nm
    try:
        fl_pth.unlink()
    except IsADirectoryError as e:
        print(f'Error: {fl_pth} : {e.strerror}')    

  for i, g_ng in g_ng_wrong.items():
    fl_nm = Path(f"face-{i}.png")
    if g_ng == 0:
      s_pth = G / fl_nm
      d_pth = NoG / fl_nm
    elif g_ng == 1:
      s_pth = NoG / fl_nm
      d_pth = G / fl_nm
    shutil.move(s_pth, d_pth)

And the terminal output was as follows.

(mclp-3.12) PS F:\learn\mcl_pytorch\chap5> python g_ng_data.py

number files: G 2856, NoG 1644, total 4500
len train dataset: 4500, len moves: 496, len bad: 52
mv g 2 ng: 419, mv ng 2 g: 77
cnt glasses: 2462; cnt no-glasses: 1986; tot good img: 4448
batches: 278.0

I am planning on batches of 16 images. So, I actually deleted the last two files in the G directory to ensure I will get complete batches for all iterations.

That 16 number is a first kick at the can. I will see if I run into any GPU memory issues. These images are going to be a fair bit larger than any of the previous ones we have seen. If not, I may try to use larger batches. If memory does become an issue I will use smaller batches.

Visualize Dataset

Okay let’s have a look at a sample of images from each category (directory). As I do not have a dataloader set up to provide images, I can’t use one of the image grid methods available in PyTorch or Matplotlib. I will get a set of random file names from each directory. Load the files as images and use subplots to display them.

I am using Pillow as Matplotlib documentation says should no longer be using imread().

Sorry, didn’t allow for specifying different batch sizes or number of subplots if batch size is fairly large.

def img_sample(i_dir):
  imgs = list(i_dir.glob('*.png'))
  smpl = random.sample(imgs, 16)
  fig = plt.figure(dpi=200, figsize=(4, 4))
  for i in range(16):
    ax = plt.subplot(4, 4, i + 1)
    img = Image.open(smpl[i])
    ax.imshow(img)
    ax.xticks([])
    ax.yticks([])
  # plt.subplots_adjust(wspace=-0.01, hspace=-0.01)
  plt.subplots_adjust(wspace=0, hspace=0)
  plt.show()

img_sample(G)

img_sample(NoG)

And an example for each category.

sample of images from glasses directory
sample of images from no glasses directory

À Demain

I think that’s it for this one. A decent intro. With the rest of the post more or less on topic.

Next time we will move on to coding the WGAN-GP python module(s).

Resources