Building the network#
The nn.Module
subpackage in PyTorch contains many neural network building blocks called “modules”. We can compose these in arbitrary ways to build network architectures tailored to a given problem.
import torch
import torch.nn as nn
# do everything on gpu unless we explicitly say otherwise
torch.set_default_device('cuda')
The basics#
We saw examples like this in earlier notebooks:
model = nn.Sequential(
nn.Linear(10,10),
nn.Tanh(),
nn.Linear(10,10),
nn.Tanh(),
nn.Linear(10,3),
nn.Sigmoid()
)
# printing the model shows the layers
model
Sequential(
(0): Linear(in_features=10, out_features=10, bias=True)
(1): Tanh()
(2): Linear(in_features=10, out_features=10, bias=True)
(3): Tanh()
(4): Linear(in_features=10, out_features=3, bias=True)
(5): Sigmoid()
)
nn.Sequential
, nn.Linear
, nn.Tanh
, and nn.Sigmoid
are all examples of modules. There are many more. You can see a full list here: https://pytorch.org/docs/stable/nn.html
Callable. All modules are callable, meaning they can be evaluated like a function:
layer = nn.Linear(4,5)
x = torch.randn(7, 4)
layer(x)
tensor([[ 0.0598, 0.0131, -0.0854, -0.6607, -0.1993],
[-0.0425, 0.5190, 0.6656, -1.0550, -0.3481],
[ 1.0251, -0.7363, -0.9741, -0.8997, 0.0310],
[-0.2934, -0.0984, -0.3092, -0.1875, -0.1605],
[-0.5643, 0.8014, 0.1436, -0.5571, -0.5451],
[-0.5590, 0.1040, -0.3402, 0.0422, -0.2416],
[-0.2919, -0.4171, -0.1272, -0.3652, -0.0406]], device='cuda:0',
grad_fn=<AddmmBackward0>)
layer = nn.Tanh()
layer(x)
tensor([[ 0.0698, -0.0402, -0.2436, -0.7002],
[-0.8310, -0.4923, 0.5896, -0.4685],
[ 0.5444, 0.7774, -0.9923, -0.9824],
[ 0.6785, -0.3635, 0.1641, -0.4440],
[ 0.0650, 0.8741, 0.3331, 0.2424],
[ 0.6767, -0.4797, 0.3434, 0.0332],
[ 0.9344, 0.2534, 0.7204, -0.7594]], device='cuda:0')
Changing device. Modules can be moved between devices. Unlike tensors, this operation is in place.
layer = nn.Linear(4,5)
print("Before:", layer.weight.device)
layer.to('cpu')
print("After:", layer.weight.device)
Before: cuda:0
After: cpu
All nested modules also move:
model = nn.Sequential(
nn.Linear(10,10),
nn.Tanh(),
nn.Linear(10,3)
)
print("Before:", model[0].weight.device)
model.to('cpu')
print("After:", model[0].weight.device)
# back on gpu for later
model.to('cuda')
Before: cuda:0
After: cpu
Sequential(
(0): Linear(in_features=10, out_features=10, bias=True)
(1): Tanh()
(2): Linear(in_features=10, out_features=3, bias=True)
)
from utils import create_answer_box
create_answer_box("Recollection check! What's the purpose of the `tanh` layer in the above model? Why is it important to include this or something like it here?", "06-01")
Recollection check! What’s the purpose of the tanh
layer in the above model? Why is it important to include this or something like it here?
Saving/loading. Model weights can be saved to and loaded from disc. There are a few ways to do this. The recommended way is to just save the weights using the “state dict” object:
for k, v in model.state_dict().items():
print(k, v.shape)
0.weight torch.Size([10, 10])
0.bias torch.Size([10])
2.weight torch.Size([3, 10])
2.bias torch.Size([3])
torch.save(model.state_dict(), 'model_weights.pt')
# Pytorch uses a version of pickle to save the weights
!head -n 3 model_weights.pt
PKmodel_weights/data.pklFZZZZZZZZ�ccollections OrderedDict q)Rq(0.weightqctorch._utils
# some time later...
model2 = nn.Sequential(
nn.Linear(10,10),
nn.Tanh(),
nn.Linear(10,3)
)
model2.load_state_dict(torch.load('model_weights.pt', weights_only=True))
<All keys matched successfully>
Using the state dict required that we instantiate the model class first. We can also save the model structure together.
torch.save(model, 'model.pt')
model2 = torch.load('model.pt')
/local_scratch/slurm.3876934/ipykernel_2140800/475113603.py:1: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
model2 = torch.load('model.pt')
Using model.state_dict()
to save weights offers greater flexibility and compatibility, as it separates the model’s parameters from its architecture, making it easy to update the model class or share weights. This approach results in smaller files and better portability across environments or versions of PyTorch, whereas saving the entire model (torch.save(model, ...)
) is simpler but less adaptable to changes.
eval
/train
modes. Some layers need to behave differently at training time and evaluation time. These can all be toggled with the train()
and eval()
methods:
layer = nn.Dropout(0.5)
# the default mode is "training"
x = torch.randn(3, 5)
print(x)
layer(x)
tensor([[ 0.9855, -1.9097, 0.7585, 0.4166, -2.3734],
[ 0.5442, -0.9407, 0.7984, -0.1559, -1.0020],
[-0.5164, 0.0046, 0.1693, 1.7144, 1.6055]], device='cuda:0')
tensor([[ 0.0000, -3.8194, 0.0000, 0.8332, -4.7467],
[ 1.0885, -1.8815, 0.0000, -0.3118, -2.0039],
[-1.0328, 0.0093, 0.0000, 3.4289, 3.2110]], device='cuda:0')
# switch to eval:
layer.eval()
layer(x)
tensor([[ 0.9855, -1.9097, 0.7585, 0.4166, -2.3734],
[ 0.5442, -0.9407, 0.7984, -0.1559, -1.0020],
[-0.5164, 0.0046, 0.1693, 1.7144, 1.6055]], device='cuda:0')
# switch back to train
layer.train()
layer(x)
tensor([[ 0.0000, -3.8194, 1.5171, 0.0000, -0.0000],
[ 1.0885, -1.8815, 0.0000, -0.3118, -2.0039],
[-1.0328, 0.0093, 0.0000, 3.4289, 3.2110]], device='cuda:0')
Writing custom modules#
You can make your own modules. To do so, subclass nn.Module
and define the __init__
and forward
method. These modules can be used just like any other module.
class NeuralNetwork(nn.Module):
def __init__(self):
"""
The __init__ method defines all of the modules/parameters that will
appear in the model.
"""
super().__init__()
self.flatten = nn.Flatten()
self.encoder = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 256),
nn.ReLU()
)
self.classifier = nn.Sequential(
nn.Linear(256,1)
)
def forward(self, x):
"""
Define how to get from the input to the output.
You can use arbitrary python code here so long as the
tensor operations are differentiable.
"""
x = self.flatten(x)
h = self.encoder(x)
y = self.classifier(h)
return y
model = NeuralNetwork()
model
NeuralNetwork(
(flatten): Flatten(start_dim=1, end_dim=-1)
(encoder): Sequential(
(0): Linear(in_features=784, out_features=512, bias=True)
(1): ReLU()
(2): Linear(in_features=512, out_features=512, bias=True)
(3): ReLU()
(4): Linear(in_features=512, out_features=256, bias=True)
(5): ReLU()
)
(classifier): Sequential(
(0): Linear(in_features=256, out_features=1, bias=True)
)
)
create_answer_box("The `forward` method is very flexible, but there are limits. Can you think of an example of code that would NOT work in the `forward` method? Why would it not work?", "06-02")
The forward
method is very flexible, but there are limits. Can you think of an example of code that would NOT work in the forward
method? Why would it not work?
# simulate a batch of grayscale images:
x = torch.randn(5, 1, 28, 28)
model(x)
tensor([[ 0.0322],
[ 0.0130],
[ 0.0116],
[-0.0103],
[-0.0058]], device='cuda:0', grad_fn=<AddmmBackward0>)
You can customize your network however you see fit. For example, say we had a problem where the network took two images as input and made some decision about them. We could do something like this:
class PairNetwork(nn.Module):
def __init__(self):
"""
The __init__ method defines all of the modules/parameters that will
appear in the model.
"""
super().__init__()
self.flatten = nn.Flatten()
self.encoder = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 256),
nn.ReLU()
)
self.classifier = nn.Sequential(
nn.Linear(2*256,1) # double the representation size
)
def forward(self, x1, x2):
"""
Define how to get from the input to the output.
You can use arbitrary python code here so long as the
tensor operations are differentiable.
"""
x1 = self.flatten(x1)
h1 = self.encoder(x1)
x2 = self.flatten(x2)
h2 = self.encoder(x2)
# fuse the representations
h = torch.concat([h1, h2], axis=-1)
y = self.classifier(h)
return y
pair_model = PairNetwork()
pair_model
PairNetwork(
(flatten): Flatten(start_dim=1, end_dim=-1)
(encoder): Sequential(
(0): Linear(in_features=784, out_features=512, bias=True)
(1): ReLU()
(2): Linear(in_features=512, out_features=512, bias=True)
(3): ReLU()
(4): Linear(in_features=512, out_features=256, bias=True)
(5): ReLU()
)
(classifier): Sequential(
(0): Linear(in_features=512, out_features=1, bias=True)
)
)
# simulate a batch of grayscale images:
x1 = torch.randn(5, 1, 28, 28)
x2 = torch.randn(5, 1, 28, 28)
pair_model(x1, x2)
tensor([[ 0.0026],
[ 0.0202],
[-0.0074],
[-0.0416],
[ 0.0159]], device='cuda:0', grad_fn=<AddmmBackward0>)
create_answer_box("This network takes two images and concatenates their representations. Can you think of real-world applications where you'd want to compare or combine multiple inputs like this?", "06-03")
This network takes two images and concatenates their representations. Can you think of real-world applications where you’d want to compare or combine multiple inputs like this?
Tracking parameters Pytorch automatically tracks all of the parameters that appear in your custom model. This allows Pytorch to optimize the network during training. It allows can allow you to get diagnostic information such as the number of parameters in your model:
num_pars = sum([p.numel() for p in model.parameters()])
print("Number of parameters:", num_pars)
Number of parameters: 796161