Well calling this Part III is likely a touch misleading. I am only doing so because I am going to use that simple model we looked at in the previous post. But rather than again looking at gradients and such, I am going to look at trying to train that rather simple model on a dataset.

I am also going to look at using validation loss as a potential trigger to stop training early. I.E. before we process all the specified epochs of training. The idea being, if the traning loss keeps going down but the validation loss starts going up, we are likely looking at the model overfitting.

Not that I think this simple model will start overfitting. But one never knows. Curiosity is winning out here. We have seen the use of early stopping before. Still—practise, practise, practise.

I am going to copy my Python module from the previous post, then refactor it to do a full training session. Perhaps with an early stop.

I will start with the same simple, small model and see what happens. Then if I feel we need to up the game a little, I may look at having more nodes in the hidden layer. And perhaps a different set of initial parameters. It seems to me that the ones I was using are somewhat large for a starting point.

Training and Validation Data

I am going to use \(y = 2x^2 + 2x + 1\) to generate the training data. Which would be valid for the input data I used in the previous post. The training set will have 40 items and the validation set will have 5 items. This is mainly about using small enough datasets for things to go quickly, but hopefully big enough to get something useful done.

Pretty straightforward code. But, for me at least, a new method: TensorDataset

# create data for training and validation
X_trn = torch.arange(-3, 3, 0.15)
y_trn = ((((X_trn)**2) * 2) + (2 * X_trn) + 1)
ds_trn = TensorDataset(X_trn, y_trn)
dl_trn = DataLoader(ds_trn, batch_size=1, shuffle=True)

# need floats, make sure to specify floats for the start and end
X_val = torch.arange(-3.0, 3.0)
y_val = ((((X_val)**2) * 2) + (2 * X_val) + 1)
ds_val = TensorDataset(X_val, y_val)
dl_val = DataLoader(ds_val, batch_size=1, shuffle=True)

Training Loop/Function

I am going to use a single function to run the complete training loop for a specified number of epochs. The function will return lists of the average training and validation losses for each epoch. We’ve seen most of this code before, so without further ado here it is.

Do note, I seem to be fond of dropping most of the vowels from function and variable names.

def trn_mdl(mdl, l_fn, opt, t_ldr, v_ldr, epochs=5):
  """Wrapper function for training the model"""
  t_loss, v_loss = [], []
  for ep in range(epochs):
    l_t_btch, l_v_btch = 0, 0

    # put model in training mode, run training epoch
    mdl.train()
    for X, y in t_ldr:
      opt.zero_grad()         # Zero the gradients wrt parameters, will accumulate otherwise
      pred = mdl(X).flatten() # forward pass
      loss = l_fn(pred, y)    # calc loss based on predictions against training values
      loss.backward()         # calc gradients wrt parameters
      opt.step()              # update parameters
      l_t_btch += loss        # update total loss to-date
    t_loss.append(l_t_btch / len(t_ldr))

    # put model in evaluation mode, calc loss on validation set
    mdl.eval()
    for X_val, y_val in v_ldr:
      # stop pytorch from calculating gradients, will do so even though in evaluation mode
      with torch.no_grad():
        for X, y in v_ldr:
          pred = mdl(X_val).flatten() # forward pass
          loss = l_fn(pred, y_val)    # calc loss based on predictions/outputs
          l_v_btch += loss
    v_loss.append(l_v_btch / len(v_ldr))

    if ep < 10 or ep > max_ep - 11:
      print(f"epoch: {ep + 1}, t_loss: {t_loss[-1]}, v_loss: {v_loss[-1]}")
      print(f"{mdl.state_dict()}")
    if ep == 10:
      print("... ...")
  return t_loss, v_loss

Training Parameters and Execution

Okay, some of the various values/methods we are going to use in training the model. I am not including the timing code.

loss_fn = nn.MSELoss()      # loss function for model
max_ep = 200                # maximum number of epochs for training
mod_opt = torch.optim.SGD   # default optimizer
m_lr = 0.01                 # learning rate

... ...

opt = mod_opt(simple.parameters(), lr=m_lr)
st_tm = time.perf_counter()
t_loss, v_loss = trn_mdl(simple, loss_fn, opt, dl_trn, dl_val, epochs=max_ep)

I also added a function to print out the state in a somewhat nicer fashion. If we let Python and PyTorch do their thing, it would look like the following:

OrderedDict({'hide.weight': tensor([[ 1.2622],
        [-0.0511]]), 'hide.bias': tensor([1.4749, 3.2164]), 'outp.weight': tensor([[1.8614, 2.4943]]), 'outp.bias': tensor([3.6162])})

Here’s a sample of executing a training session. I am not setting a random seed so it changes on every execution. Which for now is just fine.

(mclp-3.12) PS F:\learn\mcl_pytorch\rek_1> python backprop_3.py

current model parameters:
(mclp-3.12) PS F:\learn\mcl_pytorch\rek_1> python backprop_3.py

current model parameters:
        hide layer: weights -> [1.00000, -1.00000], biases -> [2.00000, 3.00000],
        outp layer: weights -> [2.00000, 3.00000], bias -> 4.00000

epoch: 1, t_loss: 39.635746002197266, v_loss: 158.4282684326172
        hide layer: weights -> [1.42225, 0.02712], biases -> [1.33918, 3.20663],
        outp layer: weights -> [1.86161, 2.34073], bias -> 3.47129
epoch: 2, t_loss: 36.748905181884766, v_loss: 155.05189514160156
        hide layer: weights -> [1.74920, 0.17339], biases -> [0.68782, 3.19107],
        outp layer: weights -> [2.25845, 2.42231], bias -> 3.55030
epoch: 3, t_loss: 34.53436279296875, v_loss: 146.8522491455078
        hide layer: weights -> [1.98927, 0.20658], biases -> [-0.16565, 3.15967],
        outp layer: weights -> [2.80930, 2.42678], bias -> 3.54562
epoch: 4, t_loss: 31.2856388092041, v_loss: 133.23475646972656
        hide layer: weights -> [1.95221, 0.18732], biases -> [-1.16568, 3.09326],
        outp layer: weights -> [3.34888, 2.08730], bias -> 3.18314
epoch: 5, t_loss: 28.621517181396484, v_loss: 125.73409271240234
        hide layer: weights -> [2.07740, 0.23140], biases -> [-1.78957, 3.07307],
        outp layer: weights -> [4.18618, 2.11453], bias -> 3.20527
epoch: 6, t_loss: 25.60274887084961, v_loss: 123.23886108398438
        hide layer: weights -> [2.37619, 0.22969], biases -> [-2.18357, 3.08124],
        outp layer: weights -> [5.10459, 2.34399], bias -> 3.44202
epoch: 7, t_loss: 23.42209243774414, v_loss: 113.79572296142578
        hide layer: weights -> [2.40484, 0.17739], biases -> [-2.77696, 3.04460],
        outp layer: weights -> [5.81429, 2.09515], bias -> 3.17964
epoch: 8, t_loss: 20.847476959228516, v_loss: 112.3241195678711
        hide layer: weights -> [2.59433, 0.12501], biases -> [-3.13536, 3.06025],
        outp layer: weights -> [6.61342, 2.30416], bias -> 3.39940
epoch: 9, t_loss: 19.62688636779785, v_loss: 109.43563079833984
        hide layer: weights -> [2.85726, 0.15003], biases -> [-3.41786, 3.04190],
        outp layer: weights -> [7.31861, 2.18061], bias -> 3.27003
epoch: 10, t_loss: 18.08064079284668, v_loss: 105.6255874633789
        hide layer: weights -> [2.97063, 0.14277], biases -> [-3.73140, 3.02485],
        outp layer: weights -> [7.94551, 2.03623], bias -> 3.11987
... ...
epoch: 191, t_loss: 1.055456280708313, v_loss: 7.413187026977539
        hide layer: weights -> [2.52284, -3.16314], biases -> [-4.31415, -7.26954],
        outp layer: weights -> [20.17461, 11.33002], bias -> 1.31839
epoch: 192, t_loss: 1.5615453720092773, v_loss: 5.241555690765381
        hide layer: weights -> [2.32598, -3.13312], biases -> [-4.35002, -7.28754],
        outp layer: weights -> [20.16085, 11.34428], bias -> 1.32551
epoch: 193, t_loss: 1.508317232131958, v_loss: 7.366010665893555
        hide layer: weights -> [2.21653, -3.23511], biases -> [-4.44066, -7.24201],
        outp layer: weights -> [20.18409, 11.36378], bias -> 1.36216
epoch: 194, t_loss: 0.9910529851913452, v_loss: 14.886592864990234
        hide layer: weights -> [2.61538, -3.12001], biases -> [-4.19782, -7.29799],
        outp layer: weights -> [20.25611, 11.37091], bias -> 1.42762
epoch: 195, t_loss: 0.9141284227371216, v_loss: 13.305169105529785
        hide layer: weights -> [2.61682, -3.25508], biases -> [-4.21379, -7.24352],
        outp layer: weights -> [20.29510, 11.39635], bias -> 1.43750
epoch: 196, t_loss: 1.3059723377227783, v_loss: 5.900460720062256
        hide layer: weights -> [2.47348, -3.22631], biases -> [-4.27772, -7.24628],
        outp layer: weights -> [20.27734, 11.39526], bias -> 1.38223
epoch: 197, t_loss: 1.0986205339431763, v_loss: 14.353645324707031
        hide layer: weights -> [2.13842, -3.16963], biases -> [-4.52667, -7.26947],
        outp layer: weights -> [20.30333, 11.39852], bias -> 1.32058
epoch: 198, t_loss: 1.1309565305709839, v_loss: 12.602222442626953
        hide layer: weights -> [2.62325, -3.20332], biases -> [-4.26686, -7.25301],
        outp layer: weights -> [20.36650, 11.41273], bias -> 1.39311
epoch: 199, t_loss: 1.1410342454910278, v_loss: 3.533965826034546
        hide layer: weights -> [2.37794, -3.26832], biases -> [-4.40729, -7.22032],
        outp layer: weights -> [20.34473, 11.43019], bias -> 1.33991
epoch: 200, t_loss: 1.6005836725234985, v_loss: 12.675270080566406
        hide layer: weights -> [2.61408, -3.20022], biases -> [-4.24443, -7.25664],
        outp layer: weights -> [20.37325, 11.43966], bias -> 1.38844

training took: 4.7619268000125885

Only using CPU as the model and datasets are rather small. Still less than 5 minutes to run 200 epochs. Will see if I feel like seeing how things work if I use the GPU.

Test the Model

Okay, before looking at early stop, let’s test the model in its current state. I will generate a new set of real data. Have the model make its predictions. Then plot the real and predicted values for comparison.

The plot function is pretty straightforward, so I am not bothering to show it. The last argument, True, is to tell the function to save a copy of the image to a file. Needed it for this post.

# let's create a new set of data and see how model does
X_t2 = torch.arange(-3, 3, 0.15)
y_t2 = ((((X_trn)**2) * 2) + (2 * X_trn) + 1)

ds_t2 = TensorDataset(X_t2, y_t2)
dl_t2 = DataLoader(ds_t2, batch_size=1, shuffle=True)

preds = []
for X, y in dl_t2:
  # let's save a bit of computation, time and memory
  with torch.no_grad():
    pred = simple(X).flatten()
    preds.append((X.item(), pred.item(), y.item()))

# sort real and predicted data on x values
preds.sort()
x_d = list(map(lambda x: x[0], preds))
y_p = list(map(lambda x: x[1], preds))
y_d = list(map(lambda x: x[2], preds))
plt_tst_img(x_d, y_d, y_p, max_ep, 1, True)
plt.show()

And here’s a sample of the generated image.

sample of image generated when testing the model

Not too shabby for such a simple model.

Early Stop

Okay, I will increase the maximum number of epochs of training from its current value of 200 to 500. Then add a function to implement an early stop. And check to see if we actually get an early stop when running the training using the new function.

Here’s my code for the early stop class.

# early stop class
class EarlyStop:
  """Stop training if validation loss increases 'patience' times in a row while training loss is decreasing"""
  def __init__(self, patience=3):
    self.t_loss = 0   # previous training loss
    self.v_loss = 0   # previous validation loss
    self.vl_inc = 0   # number of consecutive validation loss increases
    self.patience = patience   # number of patience steps before early stop
    print(f"\nEarlyStop(patience={self.patience})")
  
  def is_stop(self, t_loss, v_loss):
    self.t_loss = t_loss
    self.v_loss = v_loss
    if (t_loss <= self.t_loss) and (v_loss >= self.v_loss):
      self.vl_inc += 1
    else:
      self.vl_inc = 0

    return (self.vl_inc >= self.patience)

I then added code to instantiate the class and call its is_stop method in trn_mdl function.

def trn_mdl(mdl, l_fn, opt, t_ldr, v_ldr, epochs=5):
  """Wrapper function for training the model"""
  e_stop = EarlyStop(3)
  t_loss, v_loss = [], []
... ...
    if e_stop.is_stop(t_loss[-1], v_loss[-1]):
      print(f"\nEarly stop at epoch {ep + 1}.")
      break
    elif (ep == (epochs - 1)):
      print(f"\nNo early stop!")
      
  return t_loss, v_loss

And even when upping the maximum training epochs to 1,000 I never experienced an early stop. Not even when I reduced the patience to 2.

Done

Believe that is it for this one. Unless I decide to try a more substantial model in order to test early stopping. But that seems a touch iffy at this point.

Until next time, do occasionally take the time to play around. It has been known to help firm up one’s understanding of the material.

Resources