Pytorch Lightning#

Pytorch Ligtning wraps your Pytorch code and implements many common workflows. For instance, the training and testing loops always look very similar. Lightning makes it so that you don’t have to re-write this boilerplate code for every project. The best way to understand it, is to just implement a Pytorch Lightning model. Let’s take our previous code for EMNIST and refactor it as a Pytorch Lightning model.

# use autoreload because, by default, python will not re-import modules
%load_ext autoreload
%autoreload 2
import os
import torch
from torchvision import transforms
from utils import models
from torchvision.models import resnet18, ResNet18_Weights

Settings#

We don’t specify anything about the device here. Pytorch Lightning will automatically detect and use our gpu.

data_dir = f"/scratch/{os.environ['USER']}/data"
model_path = f"/scratch/{os.environ['USER']}/model.pt"

# Model and Training
epochs=5 # number of training epochs
batch_size=128 #input batch size for training (default: 64)
test_batch_size=1000 #input batch size for testing (default: 1000)
num_workers=10 # parallel data loading to speed things up
lr=0.1 #learning rate (default: 0.1)
gamma=0.7 #Learning rate step gamma (default: 0.7)
seed=42 #random seed (default: 42)

Dataset#

from utils import data

# transforms (we may wish to experiment with these so leave as inputs)
train_transforms = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])
test_transforms = train_transforms

train_loader = data.get_train_dataloader(data_dir, train_transforms, batch_size, num_workers)
test_loader = data.get_test_dataloader(data_dir, test_transforms, test_batch_size, num_workers)

# save a test batch for later testing
image_gen = iter(test_loader)
test_img, test_trg = next(image_gen)
print("Training dataset:", train_loader.dataset)
print("Testing dataset:", test_loader.dataset)
Training dataset: Dataset EMNIST
    Number of datapoints: 112800
    Root location: /scratch/dane2/data
    Split: Train
    StandardTransform
Transform: Compose(
               ToTensor()
               Normalize(mean=(0.1307,), std=(0.3081,))
           )
Testing dataset: Dataset EMNIST
    Number of datapoints: 18800
    Root location: /scratch/dane2/data
    Split: Test
    StandardTransform
Transform: Compose(
               ToTensor()
               Normalize(mean=(0.1307,), std=(0.3081,))
           )

The Lightning Model#

We implement Lightning Models like normal Pytorch models: we define the architecture and a forward method for passing data through the model. In addition, we implement methods defining training, validation, and optimization.

import torch.optim as optim
from torch.optim.lr_scheduler import StepLR
import torch.nn.functional as F
import lightning.pytorch as pl
import torchmetrics

# define the LightningModule
class LitModel(pl.LightningModule):
    def __init__(self, pytorch_model, lr, gamma):
        super().__init__()
        self.model = pytorch_model
        self.lr = lr
        self.gamma = gamma
        
        # metrics
        self.train_acc = torchmetrics.Accuracy(task="multiclass", num_classes=47)
        self.test_acc = torchmetrics.Accuracy(task="multiclass", num_classes=47)
        
    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        # training_step defines the train loop.
        # lightning automatically puts the model in train mode
        # gradient updates etc. are handled automatically
        # but can be customized if desired
        data, target = batch
        output = self.model(data)
        
        loss = F.cross_entropy(output, target)
        self.log("train_loss", loss)
        
        self.train_acc(output, target)
        self.log("train_acc", self.train_acc, on_step=True, on_epoch=False)
        
        return loss
    
    def validation_step(self, batch, batch_idx):
        # lightning automatically puts the model in eval mode
        # and turns off gradient tracking
        data, target = batch
        output = self.model(data)
        
        loss = F.cross_entropy(output, target)
        self.log("val_loss", loss)   
        
        self.test_acc(output, target)
        self.log("test_acc", self.test_acc, on_step=True, on_epoch=True)

    def configure_optimizers(self):
        optimizer = optim.Adadelta(model.parameters(), lr=self.lr)
        scheduler = StepLR(optimizer, step_size=1, gamma=self.gamma)
        return {'optimizer': optimizer, 'lr_scheduler': scheduler}

# init the model
pt_model = models.Classifier() #models.make_resnet18_model(weights=ResNet18_Weights.IMAGENET1K_V1)
model = LitModel(pt_model, lr, gamma)

# we use a pytorch lightning model just like a normal model
with torch.no_grad():
    x, y = next(iter(train_loader))
    y_hat = model(x)
    
y_hat.shape, y_hat
(torch.Size([128, 47]),
 tensor([[ 0.9744, -0.4397,  1.3150,  ..., -0.5819, -0.3168,  0.3866],
         [ 1.0668, -0.0310,  1.1868,  ..., -1.4330, -0.3286,  0.6900],
         [ 0.5532,  0.3253,  0.9537,  ..., -0.8252,  0.2363,  0.4263],
         ...,
         [ 1.1231, -0.2741,  1.0353,  ...,  0.0049, -0.0596,  1.1235],
         [ 0.7819,  0.5460,  1.5333,  ..., -0.0923,  0.2269, -0.1376],
         [ 1.5202, -0.3273,  1.6033,  ..., -0.5875,  0.6917,  0.9609]]))

Looks good! Our model is ready for training.

Training and testing#

We no longer need our training/testing functions. Lightning construct the appropriate training loop based on the definitions in our lightning module.

from lightning.pytorch import loggers as pl_loggers
from lightning.pytorch import Trainer

# a logger to save results
csv_logger = pl_loggers.CSVLogger(save_dir="logs/")

# the trainer class has about a million arguments. For now, the defaults will suffice.
trainer = Trainer(max_epochs=epochs, logger=csv_logger)
trainer.fit(model, train_loader, test_loader)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name      | Type               | Params
-------------------------------------------------
0 | model     | Classifier         | 23.1 K
1 | train_acc | MulticlassAccuracy | 0     
2 | test_acc  | MulticlassAccuracy | 0     
-------------------------------------------------
23.1 K    Trainable params
0         Non-trainable params
23.1 K    Total params
0.093     Total estimated model params size (MB)
`Trainer.fit` stopped: `max_epochs=5` reached.

See logs/lightning_logs for results.

Next session, we will look at some of the advanced features that we can access now that we have our model set up in Lightning:

  • Multi-gpu training

  • Automatic mixed precision

  • Advanced logging and dashboards

  • Performance profiling

  • Hyperparameter tuning