Move reused code into python script files#

Jupyter is a great place to test out ideas and get code working. It can also be a good place to run and document your experiments. However, once your code is working, it’s usually a good idea to move much of the code into python script files. This allows you to create copies of your notebook without duplicating all of the logic for how data is loaded, models are defined, and training is performed. In the end, the notebook should simply document the experiment that you performed.

By default, jupyter does not reload imported modules. If you are editing local .py files, it’s a good idea to use the autoreload extension to automatically reload the local files.

# use autoreload because, by default, python will not re-import modules
%load_ext autoreload
%autoreload 2
import os
import torch
from torchvision import transforms
import matplotlib.pyplot as plt

Settings#

These parameters are inputs to fitting process. We leave them in the notebook because we might change them from one experiment to the next.

data_dir = f"/scratch/{os.environ['USER']}/data"
model_path = f"/scratch/{os.environ['USER']}/model.pt"

# Model and Training
epochs=5 # number of training epochs
batch_size=128 #input batch size for training (default: 64)
test_batch_size=1000 #input batch size for testing (default: 1000)
num_workers=10 # parallel data loading to speed things up
lr=1.0 #learning rate (default: 1.0)
gamma=0.7 #Learning rate step gamma (default: 0.7)
no_cuda=False #disables CUDA training (default: False)
seed=42 #random seed (default: 42)
log_interval=10 #how many batches to wait before logging training status (default: 10)
save_model=False #save the trained model (default: False)

# additional derived settings
use_cuda = not no_cuda and torch.cuda.is_available()
torch.manual_seed(seed)
device = torch.device("cuda" if use_cuda else "cpu")

print("Device:", device)
Device: cuda

Dataset#

The logic for loading data will be repeated across several experiments. To avoid code duplication, we move code out of the notebook and into a separate .py file. This also reduces the number of import statements needed in the notebook itself.

from utils import data

# transforms (we may wish to experiment with these so leave as inputs)
train_transforms = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])
test_transforms = train_transforms

train_loader = data.get_train_dataloader(data_dir, train_transforms, batch_size, num_workers)
test_loader = data.get_test_dataloader(data_dir, test_transforms, test_batch_size, num_workers)

# save a test batch for later testing
image_gen = iter(test_loader)
test_img, test_trg = next(image_gen)
print("Training dataset:", train_loader.dataset)
print("Testing dataset:", test_loader.dataset)
Training dataset: Dataset EMNIST
    Number of datapoints: 112800
    Root location: /scratch/dane2/data
    Split: Train
    StandardTransform
Transform: Compose(
               ToTensor()
               Normalize(mean=(0.1307,), std=(0.3081,))
           )
Testing dataset: Dataset EMNIST
    Number of datapoints: 18800
    Root location: /scratch/dane2/data
    Split: Test
    StandardTransform
Transform: Compose(
               ToTensor()
               Normalize(mean=(0.1307,), std=(0.3081,))
           )

Model definition#

We move the model definitions to a models.py file. This file also contains test code for developing the model. In the future we may place several different model definitions into this file, so that we can compare different architecture choices.

from utils import models

# Create the model
model = models.Classifier()
# let's make sure we can run a batch of data through the model
with torch.no_grad():
    x, y = next(iter(train_loader))
    y_hat = model(x)
    
y_hat.shape, y_hat, y_hat.sum(axis=-1)
(torch.Size([128, 47]),
 tensor([[-0.3239,  0.1389,  0.3656,  ...,  0.2142,  1.0247, -0.8195],
         [-1.3441, -0.2557,  1.0103,  ...,  0.7670, -0.9166, -1.3727],
         [-0.3066,  0.2172,  0.4220,  ...,  0.5481, -0.0596, -0.7649],
         ...,
         [ 0.1117,  0.7060,  0.1698,  ...,  0.4631,  0.1400, -0.7286],
         [-0.2643,  0.9149,  0.2053,  ...,  0.4230, -0.2839, -0.8991],
         [-0.5132,  0.2362,  0.7122,  ...,  1.0606, -0.4373, -0.7467]]),
 tensor([ 3.7758,  1.4720,  0.4472,  4.0164,  3.0636,  0.6827, -4.7233,  4.0408,
          3.3448,  0.8800,  2.2962,  2.0166,  1.5692,  2.6157, -3.6126,  3.4548,
         -3.2266, -3.5242,  1.9012,  4.3836,  4.4628, -4.4008,  6.9980,  0.8163,
          6.2580, -3.6039,  5.2336,  8.5941, -2.0488,  3.4737, 10.3036,  5.3345,
          1.3189,  4.4136, -1.4858, -1.7438,  2.6710,  2.8249,  4.4213,  1.2985,
         -0.1400,  6.8650,  3.6911,  4.5677,  5.3359, 13.0531,  0.9230,  3.6649,
          7.2508,  1.8181,  0.4747,  3.3813,  3.8251,  4.7933,  2.5776, -1.1570,
         -0.6321,  4.0535, 10.5322,  1.1302,  0.5512,  5.9773,  3.6637,  3.9035,
          4.0800,  4.8139, -6.2729, -0.0227,  4.5901,  2.1803,  3.5959,  6.9278,
          3.4500,  4.8080,  4.7392,  1.4991,  1.0737,  0.4058,  6.5380,  4.1924,
         -1.0276,  4.0100, -0.2901,  0.8628,  6.3655,  1.6981,  2.8238,  6.5025,
          0.2267,  3.7050,  4.5160,  0.6726,  2.4058,  5.0493,  0.4174,  2.0125,
         -1.2594,  2.9454,  5.6681,  6.8932,  1.1167,  2.8730,  0.4145,  0.6635,
          5.3423,  2.3892,  2.5508,  6.8482,  0.1281, -1.1060,  4.7469, -2.3345,
          2.9895,  3.1585, 10.0027,  2.3747,  5.1475,  1.9999,  3.2433,  3.3553,
          1.5476,  0.4538,  0.7793,  2.0648, -0.5512,  4.0126,  4.1972,  4.4231]))
model
Classifier(
  (feature_extractor): Sequential(
    (0): Conv2d(1, 4, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (1): BatchNorm2d(4, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Conv2d(4, 4, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (4): BatchNorm2d(4, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU()
    (6): Conv2d(4, 4, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (7): BatchNorm2d(4, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (8): ReLU()
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Flatten(start_dim=1, end_dim=-1)
  )
  (classifier): Linear(in_features=484, out_features=47, bias=True)
)
print("Number of parameters:", model.num_params())
Number of parameters: 23143

Training and testing#

We also move our training logic into its own .py file.

from utils import training
model = models.Classifier().to(device)
model
Classifier(
  (feature_extractor): Sequential(
    (0): Conv2d(1, 4, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (1): LazyBatchNorm2d(0, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Conv2d(4, 4, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (4): LazyBatchNorm2d(0, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU()
    (6): Conv2d(4, 4, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (7): LazyBatchNorm2d(0, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (8): ReLU()
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Flatten(start_dim=1, end_dim=-1)
  )
  (classifier): LazyLinear(in_features=0, out_features=47, bias=True)
)
training.train_and_test(model, train_loader, test_loader, epochs, lr, gamma, device)
Test epoch 1: Average loss: 0.6373, Accuracy: 15039/18800 (79.99%)
Test epoch 2: Average loss: 0.5882, Accuracy: 15299/18800 (81.38%)
Test epoch 3: Average loss: 0.5484, Accuracy: 15611/18800 (83.04%)
Test epoch 4: Average loss: 0.5124, Accuracy: 15805/18800 (84.07%)
Test epoch 5: Average loss: 0.5047, Accuracy: 15840/18800 (84.26%)