Had started drafting what I thought would be the first post in this series on coding a CycleGAN. But as I went along I decided to put all the global variables into the config.py module. I currently had some in config.py and the majority in the main module, cyc_gan.py. Though they seemed to be slowing moving from the main module into the config module as I needed their values in the other modules. Figured it only made sense to have them all in the config module and use a namespace to access them wherever I needed any of them.

I also decided to add a few new ones to allow me to more easily control the names used for the files to which I save images or model checkpoints. I will add a function to deal with command line parameters. It will set some of those name variables (likely required parameters) and allow for changing the value of other global variables if I wish to play around with some fine tuning. Though, at this point, I think the latter is unlikely.

However, a note of caution. I tried something like this once before and ran into a number of issues. Let’s hope they don’t come up or that I can work around them if they do.

While reviewing this draft post prior to publishing it, I recalled the problem I was previously having. I was attempting to modify the config module variables from one module and then attempting to use those updated values in another module that had already imported the config module. At the time that didn’t work. For this project that does not appear to be an issue so far.

That said, I am hopeful this exercise will be of value for all future projects, as well as perhaps this one. Time will tell.

config.py

I am going to start by adding most of the global variables used in previous projects. They may or may not be used in this one, but figured best to have a good pool of variables documented and available as needed in future or if I refactor past projects. I expect as I continue coding the CycleGAN the list will change. But, for now, I need a starting point. And here it is.

Initial List of Global Variables

# ../proj6/config.py
# Ver 0.1.0: 2024.07.31, rek, 

from pathlib import Path
import argparse
import numpy as np
import torch
import torchvision.transforms as trf

# seed for torch random generator
pt_seed = 73

# random number generator
rng = np.random.default_rng()

# image transforms
trfs = [trf.ToTensor(), trf.Normalize((0.5,0.5,0.5), (0.5,0.5,0.5))]

# command line args
have_cl_arg = False         # have retrieved command line args
have_updt_globals = False   # have updated globals based on command line args

# data (path)
data_dir = Path("./data")    # primary data directory
ds_A = data_dir / "trainA"   # sub directory for horse images
ds_B = data_dir / "trainB"   # sub directory for zebra images

# data (img)
img_ht = 256            # spatial size of transformed images
img_wd = 256
channels = 3            # number of image channels
i_sz = img_ht * img_wd  # image total pixels

run_nm = "no_nm"        # name of current training run, used to create sub-directories
dataset_nm = "no_nm"    # don't really need this at present
resume = False          # resume training?

# sub-directory for all training runs
run_dir = Path("./runs")
run_dir.mkdir(exist_ok=True)
# default values for project subdirectories
img_dir = run_dir / Path(f"{run_nm}_img")
sv_dir = run_dir / Path(f"{run_nm}_sv")

# training and hyperparameters
epochs = 5                  # number of epochs for training
max_ep = 100                # maximum number of epochs for training
init_features = 16          # critic initial layer output features
batch_sz = 16               # batch size during training
mod_opt = torch.optim.Adam  # default optimizer
betas=(0.0, 0.9)            # optimizer betas
c_lr_slp = 0.2              # critic LeakyReLU negative_slope
c_lr = 0.0001               # critic learninng rate
nbr_classes = 2             # g or no-g
g_lr_slp = 0.2              # generator LeakyReLU negative_slope
g_lr = 0.0001               # generator learninng rate (higher than before)
nz = 100                    # dimension for noise tensor
ngf = 16                    # feature map size in generator
decay_epoch = 3             # suggested default : 100 (suggested 'n_epochs' is 200)
                            # epoch from which to start lr decay
# these are taken from one of the tutorials on the web
# I may implement some of the features I saw in their code
sv_img_cyc = 50             # how many batches to skip between saving sample images to file, 0 value means never
sv_chk_cyc = 50             # how many batches to skip between model checkpoints to file, 0 value means never
num_res_blks = 9            # number of residual blocks to use in networks

# allow for different float sizes in tensors, may speed up training
g_scaler = torch.cuda.amp.GradScaler()
d_scaler = torch.cuda.amp.GradScaler()

# get command line parameters
# ????

# project sub-directories
# the following will eventually be moved to a function
img_dir = run_dir / Path(f"{run_nm}_img")
img_dir.mkdir(exist_ok=True)
sv_dir = run_dir / Path(f"{run_nm}_sv")
sv_dir.mkdir(exist_ok=True)
print(f"image and checkpoint directories creeated: {img_dir} & {sv_dir}")

It I did a little test by setting run_nm = "h2z" and running the module. Things seemed to work as expected.

Command Line Parameters

I will start by writing a function to get and return any command line parameters. Not sure what they will all be other than the run name parameter which I am going to make a required one. The remainder will all likely be optional. The function itself is pretty straightforward.

I won’t be adding any help strings as I don’t expect anyone else to use this project’s modules. Though I may change my mind if I start having problems remembering what each parameter represents.

After some thought I also decided I would need a couple more functions. One to take the command line arguments and use them to update the global variables as appropriate. Messy, but such is life. And, I also need to move the sub-directory related globals and generation into a function. I don’t want that code running until the run name has been properly set. As a safety measure I also added two new globals, have_cl_arg and have_updt_globals, which default to False. They are set to true in the appropriate function if it executes successfully.

That meant I also had to decide where I would call these functions. I figured the correct place was the main project module, cyc_gan.py. I will eventually get to that.

Get Command Line Arguments

Here’s the initial attempt at the command line argument function. For now just a few possible parameters to allow altering some of the global variables and/or model hyperparameters.

# get command line arguments/parameters and update appropriate global variables
def get_cl_args():
  global have_cl_arg
  parser = argparse.ArgumentParser(description="Arguments and hyperparameters for training the current model.")

  parser.add_argument("-rn", "--run_nm", type=str, required=True, help="Run name too use for saving files and such. Required!")
  parser.add_argument("-ds", "--dataset_nm", type=str, required=False, help="Dataset to use for training model.")
  parser.add_argument("-si", "--sv_img_cyc", type=int, required=False, help="Frequency for saving input and generated images.")
  parser.add_argument("-sc", "--sv_chk_cyc", type=int, required=False, help="Frequency for saving network checkpoints.")
  parser.add_argument("-rs", "--resume", type=str, required=False, help="Resume training from previous checkpoint?")
  parser.add_argument("-ep", "--epochs", type=int, required=False, help="Number of epochs of training to use for current training session.")
  parser.add_argument("-bs", "--batch_sz", type=int, required=False, help="Batch size to use for the current training session.")
  parser.add_argument("-is", "--image_sz", type=int, required=False, help="How big should resized images be?")
  parser.add_argument("-rb", "--num_res_blks", type=int, required=False, help="Number of residual blocks to use when buidding networks.")
  # convert to dictionary
  args = vars(parser.parse_args())
  have_cl_arg = True
  return args

And in the __main__ block, a little test code. I have for now commented out the directory generating code.

if __name__ == "__main__":
  cl_args = get_cl_args()
  print(cl_args)

And a couple simple tests.

(mclp-3.12) PS F:\learn\mcl_pytorch\proj6> python config.py
usage: config.py [-h] -rn RUN_NM [-ds DATASET_NM] [-sv SAVE_INT] [-rs RESUME] [-ep EPOCHS] [-bs BATCH_SZ]
                 [-is IMAGE_SZ] [-rb NUM_RES_BLK]
config.py: error: the following arguments are required: -rn/--run_nm

(mclp-3.12) PS F:\learn\mcl_pytorch\proj6> python config.py -rn h2z
{'run_nm': 'h2z', 'dataset_nm': None, 'save_int': None, 'resume': None, 'epochs': None, 'batch_sz': None, 'image_sz': None, 'num_res_blk': None}

And that seems to work.

Use Command Line Parameters to Update the Global Variables

I tried using exec() to cut down on the code. But it did not initially work for me. So I did the updates long hand. Pretty straightforward code, so no explanations/comments.

# need to update globals based on received command line args
# bit of a pain but...
def updt_cl_args(args):
  global run_nm, dataset_nm, sv_img_interval, resume, trn_len, batch_sz, img_ht, img_wd, i_sz, num_res_blks, have_updt_globals
  for cla, clav in args.items():
    if clav is not None:
      if cla == "run_nm":
        run_nm = clav
      elif cla == "dataset_nm":
        dataset_nm = clav
      elif cla == "sv_img_interval":
        sv_img_interval = clav
      elif cla == "resume":
        resume = clav
      elif cla == "epochs":
        epochs = clav
      elif cla == "batch_sz":
        batch_sz = clav
      elif cla == "image_sz": 
        img_ht, img_wd, i_sz = clav, clav, clav * clav
      elif cla == "num_res_blks":
        num_res_blks = clav
  have_updt_globals = True

Added the following to my test code.

  print(f"before updt -> run_nm : {run_nm}, epochs: {epochs}")
  updt_cl_args(cl_args)
  print(f"after updt ->  run_nm : {run_nm}, epochs: {epochs}")

And a quick test shows things appear to work.

(mclp-3.12) PS F:\learn\mcl_pytorch\proj6> python config.py --run_nm rek1 -ep 25
{'run_nm': 'rek1', 'dataset_nm': None, 'sv_img_cyc': None, 'resume': None, 'epochs': 25, 'batch_sz': None, 'image_sz': None, 'num_res_blks': None}
before updt -> run_nm : no_nm, epochs: 5
after updt ->  run_nm : rek1, epochs: 25

But I re-read the documentation for the exec() function and noticed something I missed the first time. So, I tried again with more success, which resulted in a simpler and smaller function. Though I did have to refactor things to ensure the global variable names matched the long version of the command line argument name (saved creating a new dictionary—lookup table). The code above has been modified to do so.

def updt_cl_args(args):
  global have_updt_globals
  for cla, clav in args.items():
    if clav is not None:
      if cla == "image_sz":  
        img_ht, img_wd, i_sz = clav, clav, clav * clav      
      else:
        if isinstance(clav, int):
          g_cmd = f"{cla} = {clav}"
        else:
          g_cmd = f"{cla} = '{clav}'"
        # print(g_cmd)
        exec(g_cmd, globals())
  have_updt_globals = True

And the earlier test repeated exactly as expected/desired.

Create Suitably Named Run Subdirectories

Again pretty straightforward code. Though I did, as mentioned, add a bit of a safety measure.

I keep thinking I should exit module execution if the function is called without globals being updated by command line arguments. But, for now, I am the only one using the code, so…

# project sub-directories, in function so can call once command line args are retrieved
def mk_dirs():
  global img_dir, sv_dir
  if have_cl_arg and have_updt_globals:
    img_dir = run_dir / Path(f"{run_nm}_img")
    img_dir.mkdir(exist_ok=True)
    sv_dir = run_dir / Path(f"{run_nm}_sv")
    sv_dir.mkdir(exist_ok=True)
    print(f"image and checkpoint directories created: {img_dir} & {sv_dir}")
    return True
  else:
    print(f"image and checkpoint directories not created, command line args not yet retrieved")
    return False

And, adding a couple calls to the test code, before and after updating the globals, generates the following.

(mclp-3.12) PS F:\learn\mcl_pytorch\proj6> python config.py --run_nm rek1 -ep 25
{'run_nm': 'rek1', 'dataset_nm': None, 'sv_img_cyc': None, 'resume': None, 'epochs': 25, 'batch_sz': None, 'image_sz': None, 'num_res_blks': None}
before updt -> run_nm : no_nm, epochs: 5
image and checkpoint directories not created, command line args not yet retrieved
after updt ->  run_nm : rek1, epochs: 25
image and checkpoint directories created: runs\rek1_img & runs\rek1_sv

So far so good.

Refactor Main Module to Call the New Functions

I figured I could just put the relevant function calls (pretty much what was in the config modules test code) right after the call to set_detect_anomaly. Before any training or production code. You of course have not yet seen that code so the preceding may not be all that meaningful to you. Next post will get you up to speed (I hope).

... ...
# debugging
# explicitly raise an error with a stack trace
torch.autograd.set_detect_anomaly(True)

# get command line parameters, update globals, create project sub-directories
cl_args = cfg.get_cl_args()
if tst_clp:
  print(cl_args)
  print(f"before updt -> run_nm : {cfg.run_nm}, epochs: {cfg.epochs}")
cfg.updt_cl_args(cl_args)
if tst_clp:
  print(f"after updt ->  run_nm : {cfg.run_nm}, epochs: {cfg.epochs}")
cfg.mk_dirs()
... ...

And, quick test.

(mclp-3.12) PS F:\learn\mcl_pytorch\proj6> python cyc_gan.py -rn new_tst -ep 50
{'run_nm': 'new_tst', 'dataset_nm': None, 'sv_img_cyc': None, 'sv_chk_cyc': None, 'resume': None, 'epochs': 50, 'batch_sz': None, 'image_sz': None, 'num_res_blks': None}
before updt -> run_nm : no_nm, epochs: 5
after updt ->  run_nm : new_tst, epochs: 50
image and checkpoint directories created: runs\new_tst_img & runs\new_tst_sv

Done

Well, that little bit of coding pretty much wore me out. So, going to call this post done.

Next time I will get into some of the initial code for the main project module, cyc_gan.py. And, likely, some of the other modules I will be employing to keep my code perhaps a little tidier.

Until then, do play around with your options. It often makes things a touch nicer. But then, easy for me as I am not on any schedule.

Resources