Imports

These are the imports for everything we'll be using in this notebook

from torch import nn

from fastai.vision.all import *
from fastai.callback.hook import summary
from fastai.callback.schedule import fit_one_cycle, lr_find 
from fastai.callback.progress import ProgressCallback

from fastai.data.core import Datasets, DataLoaders, show_at
from fastai.data.external import untar_data, URLs
from fastai.data.transforms import Categorize, GrandparentSplitter, parent_label, ToTensor, IntToFloatTensor, Normalize

from fastai.layers import Flatten
from fastai.learner import Learner

from fastai.metrics import accuracy, CrossEntropyLossFlat

from fastai.vision.augment import CropPad, RandomCrop, PadMode
from fastai.vision.core import PILImageBW
from fastai.vision.utils import get_image_files

import matplotlib.pyplot as plt
plt.style.use('dark_background')

grabbing our data

path = untar_data(URLs.MNIST)

Working with the data

items = get_image_files(path)

items[0]

Path('/home/jmtzt/.fastai/data/mnist_png/testing/9/2934.png')

im = PILImageBW.create(items[0]); im.show()

<AxesSubplot:>

Split our data with GrandparentSplitter, which will make use of a train and valid folder.

splits = GrandparentSplitter(train_name='training', valid_name='testing')

splits = splits(items)

splits[0][:5], splits[1][:5]

([10000, 10001, 10002, 10003, 10004], [0, 1, 2, 3, 4])

Make a Datasets
Expects items, transforms for describing our problem, and a splitting method

dsrc = Datasets(items, tfms=[[PILImageBW.create], [parent_label, Categorize]],
                  splits = splits)

show_at(dsrc.train, 3)

<AxesSubplot:title={'center':'9'}>

Next we need to give ourselves some transforms on the data! These will need to:

Ensure our images are all the same size
Make sure our output are the tensor our models are wanting
Give some image augmentation

tfms = [ToTensor(), CropPad(size=34, pad_mode=PadMode.Zeros), RandomCrop(size=28)]

ToTensor: Converts to tensor
CropPad and RandomCrop: Resizing transforms
Applied on the CPU via after_item

gpu_tfms = [IntToFloatTensor(), Normalize()]

IntToFloatTensor: Converts to a float
Normalize: Normalizes data

dls = dsrc.dataloaders(bs=128, after_item=tfms, after_batch=gpu_tfms)

dls.show_batch()

xb, yb = dls.one_batch()

xb.shape, yb.shape

(torch.Size([128, 1, 28, 28]), torch.Size([128]))

dls.c

10

So our input shape will be a [128 x 1 x 28 x 28] and our output shape will be a [128] tensor that we need to condense into 10 classes

Model definition

This model will have 5 convolutional layers
We'll use nn.Sequential
1 -> 32 -> 10

def conv(ni, nf): return nn.Conv2d(ni, nf, kernel_size=3, stride=2, padding=1)

Here we can see our ni is equivalent to the depth of the filter, and nf is equivalent to how many filters we will be using. (Fun fact this always has to be divisible by the size of our image).

Batch Normalization

As we send our tensors through our model, it is important to normalize our data throughout the network. Doing so can allow for a much larger improvement in training speed, along with allowing each layer to learn independantly (as each layer is then re-normalized according to it's outputs)

def bn(nf): return nn.BatchNorm2d(nf)

nf will be the same as the filter output from our previous convolutional layer

Activation functions

They give our models non-linearity and work with the weights we mentioned earlier along with a bias through a process called back-propagation. These allow our models to learn and perform more complex tasks because they can choose to fire or activate one of those neurons mentioned earlier. On a simple sense, let's look at the ReLU activation function. It operates by turning any negative values to zero, as visualized below:

def ReLU(): return nn.ReLU(inplace=False)

model = nn.Sequential(
    conv(1, 8),
    bn(8),
    ReLU(),
    conv(8, 16),
    bn(16),
    ReLU(),
    conv(16,32),
    bn(32),
    ReLU(),
    conv(32, 16),
    bn(16),
    ReLU(),
    conv(16, 10),
    bn(10),
    Flatten()
)

learn = Learner(dls, model, loss_func=CrossEntropyLossFlat(), metrics=accuracy)

learn.summary()

Sequential (Input shape: 128)
============================================================================
Layer (type)         Output Shape         Param #    Trainable 
============================================================================
                     128 x 8 x 14 x 14   
Conv2d                                    80         True      
BatchNorm2d                               16         True      
ReLU                                                           
____________________________________________________________________________
                     128 x 16 x 7 x 7    
Conv2d                                    1168       True      
BatchNorm2d                               32         True      
ReLU                                                           
____________________________________________________________________________
                     128 x 32 x 4 x 4    
Conv2d                                    4640       True      
BatchNorm2d                               64         True      
ReLU                                                           
____________________________________________________________________________
                     128 x 16 x 2 x 2    
Conv2d                                    4624       True      
BatchNorm2d                               32         True      
ReLU                                                           
____________________________________________________________________________
                     128 x 10 x 1 x 1    
Conv2d                                    1450       True      
BatchNorm2d                               20         True      
____________________________________________________________________________
                     []                  
Flatten                                                        
____________________________________________________________________________

Total params: 12,126
Total trainable params: 12,126
Total non-trainable params: 0

Optimizer used: <function Adam at 0x7f56a8917790>
Loss function: FlattenedLoss of CrossEntropyLoss()

Callbacks:
  - TrainEvalCallback
  - Recorder
  - ProgressCallback

learn.summary also tells us:

Total parameters
Trainable parameters
Optimizer
Loss function
Applied Callbacks

learn.lr_find()

SuggestedLRs(lr_min=0.33113112449646, lr_steep=0.7585775852203369)

learn.fit_one_cycle(3, lr_max=1e-1)

Simplifying our model

Try to make it more like ResNet.
ConvLayer contains a Conv2d, BatchNorm2d, and an activation function

def conv2(ni, nf): return ConvLayer(ni, nf, stride=2)

net = nn.Sequential(
    conv2(1,8),
    conv2(8,16),
    conv2(16,32),
    conv2(32,16),
    conv2(16,10),
    Flatten()
)

learn = Learner(dls, net, loss_func=CrossEntropyLossFlat(), metrics=accuracy)

learn.fit_one_cycle(3, lr_max=1e-1)

ResNet (kind of)

The ResNet architecture is built with what are known as ResBlocks. Each of these blocks consist of two ConvLayers that we made before, where the number of filters do not change. Let's generate these layers.

class ResBlock(Module):
    
    def __init__(self, nf):
        self.conv1 = ConvLayer(nf, nf)
        self.conv2 = ConvLayer(nf, nf)

        
    def forward(self, x): return x + self.conv2(self.conv1(x))

Class notation
__init__
forward

net = nn.Sequential(
    conv2(1,8),
    ResBlock(8),
    conv2(8,16),
    ResBlock(16),
    conv2(16,32),
    ResBlock(32),
    conv2(32,16),
    ResBlock(16),
    conv2(16,10),
    Flatten()
)

Awesome! We're building a pretty substantial model here. Let's try to make it even simpler. We know we call a convolutional layer before each ResBlock and they all have the same filters, so let's make that layer!

def conv_and_res(ni, nf): return nn.Sequential(conv2(ni, nf), ResBlock(nf))

net = nn.Sequential(
    conv_and_res(1,8),
    conv_and_res(8,16),
    conv_and_res(16,32),
    conv_and_res(32,16),
    conv2(16,10),
    Flatten()
)

And now we have something that resembles a ResNet! Let's see how it performs

learn = Learner(dls, net, loss_func=CrossEntropyLossFlat(), metrics=accuracy)

learn.lr_find()

SuggestedLRs(lr_min=0.15848932266235352, lr_steep=0.2089296132326126)

learn.fit_one_cycle(3, lr_max=1e-1)

learn.path = Path('')
learn.export(fname='export.pkl')

epoch	train_loss	valid_loss	accuracy	time
0	0.210623	0.194198	0.939300	00:09
1	0.139447	0.079532	0.975500	00:09
2	0.068283	0.037102	0.987500	00:09

epoch	train_loss	valid_loss	accuracy	time
0	0.202230	0.219875	0.931000	00:09
1	0.130472	0.079829	0.972500	00:09
2	0.076784	0.038371	0.986900	00:09

epoch	train_loss	valid_loss	accuracy	time
0	0.154220	0.295265	0.907000	00:10
1	0.087304	0.072216	0.976400	00:10
2	0.041664	0.023510	0.992200	00:10