Okay, let’s try to build a PyTorch model. In this case, I will train a basic image classification model. We’ll be using the Fashion-MNIST dataset provided by TorchVision (makes for easy download). It is a collection of grayscale images of ten categories of clothing.

I plan to initially just do a binary classification using only two of the clothing categories. Then move on to multi-category classification using all 10 clothing categories.

Data Preprocessing

After all the usual imports, we will set up a transformation to be used on the dataset when we download it. The transformation will first convert the image data to a tensor (ToTensor()). This takes the individual images and turns them into float tensors where the pixel values (integers 0-255) are converted to floats with values in the range 0 to 1. Then we will use Normalize() to zero-center and normalize the image content. For each value in each tensor we will subtract 0.5 then divide by 0.5. The values will now be in the range -1 to 1.

Note that we set a variable device to specify where the bulk of the data manipulation should be done. In this case device will be 'cuda' as cuda is in fact available on my development pc. That variable will be used in the module’s code to ensure the necessary model definitions and data are all on the same and best device.

# binary_class.py
#  - train binary classification model for Fashion-MNIST dataset
# Ver 0.1.0: 2024.03.17, rek, get started figuring this out

import torch
import torch.nn  as nn
import torchvision
import torchvision.transforms as trf

# set seed for reproducibility
torch.manual_seed(73)

# will need this later
device = "cuda" if torch.cuda.is_available() else "cpu"

# instantiate our data transform for the image dataset
# generate tensors (vals 0-1), centre and normalize tensors
transform = trf.Compose([
trf.ToTensor(),
trf.Normalize([0.5], [0.5])
])

Now let’s get the data and transform it appropriately.

# Create datasets for training & testing, download if necessary
ds_train = torchvision.datasets.FashionMNIST(
    root="./data",
    train=True,
    download=True,
    transform=transform
)
ds_test = torchvision.datasets.FashionMNIST(
    root="./data",
    train=False,
    download=True,
    transform=transform
)

# Class labels, in dataset they are digits 0-9
categories = ('t-shirt/top', 'trouser', 'pullover', 'dress', 'coat',
        'sandal', 'shirt', 'sneaker', 'bag', 'ankle Boot')

# Show split sizes
print(f"Training set has {len(ds_train)} instances")
print(f"Test set has {len(ds_test)} instances")

Fair bit of output.

(mclp-3.12) PS F:\learn\mcl_pytorch\chap2> python binary_class.py
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./data\FashionMNIST\raw\train-images-idx3-ubyte.gz
100.0%
Extracting ./data\FashionMNIST\raw\train-images-idx3-ubyte.gz to ./data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./data\FashionMNIST\raw\train-labels-idx1-ubyte.gz
100.0%
Extracting ./data\FashionMNIST\raw\train-labels-idx1-ubyte.gz to ./data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./data\FashionMNIST\raw\t10k-images-idx3-ubyte.gz
100.0%
Extracting ./data\FashionMNIST\raw\t10k-images-idx3-ubyte.gz to ./data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./data\FashionMNIST\raw\t10k-labels-idx1-ubyte.gz
100.0%
Extracting ./data\FashionMNIST\raw\t10k-labels-idx1-ubyte.gz to ./data\FashionMNIST\raw

Training set has 60000 instances
Test set has 10000 instances

Visualize

Because we can, let’s have a look as some of those images.

# Let's have a look at some of the images
plt.figure(figsize=(5, 4))
for i in range(15):
    ax = plt.subplot(3, 5, i + 1)
    img = ds_train[i][0]
    # reset to values in range 0-1, i.e. unnormalize
    img = (img * 0.5) + 0.5
    # let's get our original 28 x 28 pixel image
    img = img.reshape(28, 28)
    plt.imshow(img, cmap="Greys")
    plt.axis("off")
    # category id is second element of each tensor
    plt.title(categories[ds_train[i][1]], fontsize=8)

plt.show()
plot showing 15 sample images from the training dataset

That sample didn’t include all the clothing categories. But it does give us an idea of how things look.

Binary Classification

We will create binary classification datasets for training and testing. We will use the training set to generate batches to be used for training the model. We will build a neural network and train it. And lastly we will test the model. I will use images of trousers and shirts (indices 1 and 6).

Creating Batches

The Dataset and DataLoader classes encapsulate the process of pulling your data from storage and exposing it to your training loop in batches.

The Dataset is responsible for accessing and processing single instances of data.

The DataLoader pulls instances of data from the Dataset (either automatically or with a sampler that you define), collects them in batches, and returns them for consumption by your training loop. The DataLoader works with all kinds of datasets, regardless of the type of data they contain.

Training with PyTorch

# Binary data, trousers and shirts
bin_train = [x for x in ds_train if x[1] in [1, 6]]
bin_test = [x for x in ds_test if x[1] in [1, 6]]
# Create data loaders for our datasets; shuffle for training and testing
batch_sz = 64
training_loader = torch.utils.data.DataLoader(bin_train, batch_size=batch_sz, shuffle=True)
test_loader = torch.utils.data.DataLoader(bin_test, batch_size=batch_sz, shuffle=True)

Creating Classifier Model

I am using the PyTorch Sequential container.

Modules will be added to it in the order they are passed in the constructor. Alternatively, an OrderedDict of modules can be passed in. The forward() method of Sequential accepts any input and forwards it to the first module it contains. It then “chains” outputs to inputs sequentially for each subsequent module, finally returning the output of the last module.

torch.nn.Sequential

I will apply a series of linear transformations to each image. Each transformation will use fewer neurons. The numbers selected are somewhat arbitrary. With the exception of the last output value: 1. We want a single value, supposedly the probability the image being tested is a shirt. The first transformation flattens the 2-D image into a 1-D tensor (a requirement of the model). Then we have the three hidden layers.

Between each transformation layer we will apply the ReLU (rectified linear unit) activation function. The activation function is used to determine which neurons should be turned on or off. There are other activation functions, but I understand this is the go-to activation function for classifiers.

Will also apply a Dropout.

During training, randomly zeroes some of the elements of the input tensor with probability p.

torch.nn.Dropout

On the final output, we use the Sigmoid function to convert it to a value in the range 0-1. We will interpret this as the probability that the piece of clothing is a shirt. The complementary probability is that the object is a pair of trousers.

I will use a learning rate of 0.001 (the default for many PyTorch optimizers). And, will use the Adam optimizer, a variant of the gradient descent algorithm. But, Adam takes into consideration gradients in previous iterations not just those in the current iteration. I will use the binary cross-entropy loss function (nn.BCELoss). Training of a model minimizes the loss function in order to adjust its parameters. As the name implies this particular loss function is suited for binary classification.

# build the Sequentail model
bc_model = nn.Sequential(
  nn.Linear(28*28, 256),
  nn.ReLU(),
  nn.Linear(256, 128),
  nn.ReLU(),
  nn.Linear(128, 32),
  nn.ReLU(),
  nn.Linear(32, 1),
  nn.Dropout(p=0.25),
  nn.Sigmoid()
).to(device)

# a few more values to set: learning rate, optimizer and loss function
lr = 0.001
optimizer = torch.optim.Adam(bc_model.parameters(), lr=lr)
loss_fn = nn.BCELoss()

Training and Testing

I will train the model for 50 epochs. No validation set to allow for early stopping. Coding and training the model is certainly more complex than my limited involvement with Scikit-learn. Considerably more hands on knowledge required.

Added code to time the training and testing of the model.

# train the classification model
st_tm = time.perf_counter()
for i in range(50):
  tot_loss = 0
  for imgs, labels in training_loader:
    imgs = imgs.reshape(-1, 28*28)
    imgs = imgs.to(device)
    labels = torch.FloatTensor([0 if x==1 else 1 for x in labels])
    labels = labels.reshape(-1, 1).to(device)
    preds = bc_model(imgs)
    loss = loss_fn(preds, labels)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    tot_loss += loss
nd_tm = time.perf_counter()
print(f"\ntime to train model: {nd_tm - st_tm}")

# test the model
st_tm = time.perf_counter()
results = []
for imgs, labels in test_loader:
  imgs = imgs.reshape(-1, 28*28).to(device)
  labels = torch.FloatTensor([0 if x==1 else 1 for x in labels])
  labels = labels.reshape(-1, 1).to(device)
  preds = bc_model(imgs)
  pred_l = torch.where(preds>0.5, 1, 0)
  correct = (pred_l == labels)
  results.append(correct.detach().cpu().numpy().mean())
accuracy = np.array(results).mean()
print(f"\nprediction accuracy is {accuracy}")
nd_tm = time.perf_counter()
print(f"\ntime to test model: {nd_tm - st_tm}")

And, the result. After what I considered a rather short time (based on past experience doing this kind of thing using only the CPU). No idea how long it would have taken with CPU only. But also not willing to test it out.

(mclp-3.12) PS F:\learn\mcl_pytorch\chap2> python binary_class.py

time to train model: 20.605526900006225

prediction accuracy is 0.85986328125

time to test model: 0.046268299993244

Done

I think that’s it for this post. Next time we will look at classification against all ten clothing categories.

Until the next time, I do hope your GPU is CUDA enabled.

Resources