Model fine-tuning

Model fine-tuning#

Model fine-tuning is the process of taking a model that has already been pre-trained on some large, diverse task and then doing a small amount of extra training (usually with a much smaller and more specific dataset) to adapt the model to a specific task of interest. This can lead to massive performance benefits for a given sample size. The closer the target task to the pre-training task, the better the transfer. However, one usually sees benefits even when the tasks are quite different (e.g. ImageNet -> Medical Ultrasound).

To demonstrate, we will start with the previous notebook and swap in a pre-trained model.

# use autoreload because, by default, python will not re-import modules
%load_ext autoreload
%autoreload 2

import os
import torch
from torchvision import transforms
import matplotlib.pyplot as plt

Settings#

data_dir = f"/scratch/{os.environ['USER']}/data"
model_path = f"/scratch/{os.environ['USER']}/model.pt"

# Model and Training
epochs=5 # number of training epochs
batch_size=128 #input batch size for training (default: 64)
test_batch_size=1000 #input batch size for testing (default: 1000)
num_workers=10 # parallel data loading to speed things up
lr=0.1 #learning rate (default: 0.1)
gamma=0.7 #Learning rate step gamma (default: 0.7)
no_cuda=False #disables CUDA training (default: False)
seed=42 #random seed (default: 42)
log_interval=10 #how many batches to wait before logging training status (default: 10)
save_model=False #save the trained model (default: False)

# additional derived settings
use_cuda = not no_cuda and torch.cuda.is_available()
torch.manual_seed(seed)
device = torch.device("cuda" if use_cuda else "cpu")

print("Device:", device)

Dataset#

from utils import data

# transforms (we may wish to experiment with these so leave as inputs)
train_transforms = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])
test_transforms = train_transforms

train_loader = data.get_train_dataloader(data_dir, train_transforms, batch_size, num_workers)
test_loader = data.get_test_dataloader(data_dir, test_transforms, test_batch_size, num_workers)

# save a test batch for later testing
image_gen = iter(test_loader)
test_img, test_trg = next(image_gen)

print("Training dataset:", train_loader.dataset)
print("Testing dataset:", test_loader.dataset)

Model definition#

The torchvision library provides many pre-defined model architectures and trained model weights. Many other models can be downloaded using Pytorch Image Models and the Huggingface libraries. See the docs’ description of the weights we’re using here.

from torchvision.models import resnet18, ResNet18_Weights

# pretrained weights with advertised accuracy of 80.858% on the validation set
model = resnet18(weights=ResNet18_Weights.IMAGENET1K_V1)

print(model)

# let's make sure we can run a batch of data through the model
with torch.no_grad():
    x, y = next(iter(train_loader))
    
    try:
        y_hat = model(x)
        print(y_hat.shape, y_hat)
    except RuntimeError as e: 
        print("RuntimeError:", e)
    
# we can't!

Coding challenge#

We need to adapt this powerful ResNet model for our data, so that we can fine-tune it on our dataset. This will allow us to leverage the power all that vast pre-training, for our specialized vision task!

Since ResNet was built for ImageNet, it makes assumptions that are not true of our specialized dataset. So, we’ll need to modify two of its layers.

Your task:
Write code that performs the following two modifications:

Input layer:
Replace the first convolutional layer so that it accepts grayscale images (1 channel) instead of the default 3-channel RGB images.
Output layer:
Replace the final classification layer so that it outputs predictions for the correct number of classes in our medical imaging task (for example: num_classes = 2 for binary classification).

Bonus:
To get the most out of the pretrained weights, consider initializing the new first convolutional layer by averaging the pretrained weights across the color channels.

To solve this issue, let’s just swap out the initial convolution layer with one expecting a single channel. This convolution will be trained from scratch.

# Note that we can see the offending first layer like this:
model.conv1
# And we can replace it using the same logic. E.g.:
# model.conv1 = ...
# You may want to take a look at the documentation for the `torch.nn.Conv2d` class. 
# You can do that just by running `torch.nn.Conv2d?` in a code cell.
#
# If you're attempting the bonus, note that you can access 
# the weights of a layer using the .weight attribute, e.g. model.conv1.weight

# Use this code cell to adapt ResNet to our specialized data!
model.conv1 = torch.nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
model.fc = torch.nn.Linear(512, 47, bias=True)

# Run this code cell to get a count of the trainable parameters in your model
print(f"Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")

from utils.response import create_answer_box

create_answer_box("When you've finished adapting the model, report the resulting number of trainable parameters here. Alternatively, if you have questions or ran into problems, please ask or describe the problems here.", "03-01")

# let's try again
with torch.no_grad():
    x, y = next(iter(train_loader))
    y_hat = model(x)
    
y_hat.shape, y_hat

Looks good! Our model is ready for training.

Scriptify model creation#

Now that we’ve got this working, it would be a good idea to put this logic into our models.py script. When doing so, we might want to pass in the weights as an argument. This will allow us to load in different pretrained weights or none at all for random initialization.

from utils import models
model_pretrained = models.make_modified_resnet18_model(weights=ResNet18_Weights.IMAGENET1K_V1)
model_random = models.make_modified_resnet18_model(weights=None)

Training and testing#

We can re-use our training code. Note: training will take much longer because ResNet18 is a much larger model.

from utils import training

Random weight model#

_ = model_random.to(device)

training.train_and_test(model_random, train_loader, test_loader, epochs, lr, gamma, device)

Pretrained Model#

_ = model_pretrained.to(device)

training.train_and_test(model_pretrained, train_loader, test_loader, epochs, lr, gamma, device)

from utils.response import create_answer_box

create_answer_box("Having tested both the randomly initialized and the pretrained model, do you observe a large difference in their performance? Whether yes or no -- why do you think that is?", "03-02")

Coding challenge!#

Above, we made changes to Resnet’s first convolution layer and its output layer. Now, make another change to the model, of your choice, and then evaluate the model (using train_and_test) to see how it performs. You could change the number of layers, the activation function, the size of a linear layer, etc. – the point is to get you practice in making a change to a model.

Remember that if your change involves changing the input or output size of a layer, you may need to alter other layers as well to accommodate the change.

# Make your change and then use `training.train_and_test` to evaluate the result!

create_answer_box("Briefly describe what change you made to the model and how it affected the model's performance. Did it improve, worsen, or stay the same?", "03-04")