Experiments with State-of-the-Art techniques
This notebook contains a series of experimentations using different approaches in the training process of a Convolutional Neural Network using a subsection of the MAMMOSET dataset
- Imports
- DataBlock/DataLoaders
- Baseline run
- Normalization
- Progressive Resizing
- Test Time Augmentation (TTA)
- Mixup Technique
- Label Smoothing
from fastai2.vision.all import *
from utils import *
path = Path('./DDSM_NOBARS/MASS/'); path.ls()
dblock = DataBlock(blocks=(ImageBlock(), CategoryBlock()),
get_items=get_image_files,
get_y=parent_label,
splitter=RandomSplitter(seed=42),
item_tfms=Resize(460),
batch_tfms=aug_transforms(size=224, min_scale=0.75))
#dblock.summary(path)
dls = dblock.dataloaders(path, bs=64)
model = resnet34()
learn = Learner(dls, model, loss_func=CrossEntropyLossFlat(), metrics=[accuracy, error_rate]).to_fp16()
learn.fit_one_cycle(5, 3e-3)
Testing if the data is normalized i.e. has a mean of 0 and a std of 1.
x,y = dls.one_batch()
x.shape,y.shape
x.mean(dim=[0,2,3]),x.std(dim=[0,2,3])
The mean and standart deviation are not close to the desired values, therefore we need to normalize them by adding to the DataBlock the Normalize
transform that uses the ImageNet mean and std.
def get_dls(bs, size):
dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
get_y=parent_label,
splitter=RandomSplitter(seed=42),
item_tfms=Resize(460),
batch_tfms=[*aug_transforms(size=size, min_scale=0.75),
Normalize.from_stats(*imagenet_stats)])
return dblock.dataloaders(path, bs=bs)
dls = get_dls(64, 224)
x,y = dls.one_batch()
x.mean(dim=[0,2,3]),x.std(dim=[0,2,3])
Checking if it affects our model
model = resnet34()
learn = Learner(dls, model, loss_func=CrossEntropyLossFlat(), metrics=[accuracy, error_rate]).to_fp16()
learn.fit_one_cycle(5, 3e-3)
We can see some improvement in our acccuracy, but not a whole lot, because we are training this model from scratch. If we were to train this model based on a transfer learning approach, we would have to pay even more attention to that, in order to match the statistics used for normalization of the pre-trained model.
This approach is basically gradually using larger and larger images as you train your model
dls = get_dls(128, 128)
learn = Learner(dls, resnet34(), loss_func=CrossEntropyLossFlat(),
metrics=[accuracy, error_rate]).to_fp16()
learn.fit_one_cycle(4, 3e-3)
We increase the image resolution and decrease the batch size and proceed to fine-tune the model:
learn.dls = get_dls(64, 224)
learn.fine_tune(5, 1e-3)
As we can see, the accuracy improved when using this technique.
Up until now we have been using random cropping as data augmentation, which may lead to some problems such as some critical features being cropped out of the image. One technique that might help mitigate this problem is select a number of areas to crop from the original rectangular image and then pass each of them through the model and take the average of the predictions, that is just applying a form of augmentation in the validation dataset as well.
preds,targs = learn.tta()
accuracy(preds, targs).item()
mixup: Beyond Empirical Risk Minimization: https://arxiv.org/abs/1710.09412
Mixup works as follows, for each image:
- Select another image from your dataset at random.
- Pick a weight at random.
- Take a weighted average (using the weight from step 2) of the selected image with your image; this will be your independent variable.
- Take a weighted average (with the same weight) of this image's labels with your image's labels; this will be your dependent variable.
ben = PILImage.create(get_image_files_sorted(path/'benigna')[0])
mal = PILImage.create(get_image_files_sorted(path/'maligna')[0])
ben = ben.resize((256,256))
mal = mal.resize((256,256))
tben = tensor(ben).float() / 255.
tmal = tensor(mal).float() / 255.
_,axs = plt.subplots(1, 3, figsize=(12,4))
show_image(tben, ax=axs[0]);
show_image(tmal, ax=axs[1]);
show_image((0.3*tben + 0.7*tmal), ax=axs[2]);
The third image is 30% the first one and 70% the second one, so the model must predict 30% benign and 70% malign. So the one-hot-encoded representations of the predicitons are (in this dataset which we have 2 classes):
[1,0] and [0, 1]
But we are aiming for this type of prediction:
[0.3, 0.7]
model = resnet34()
learn = Learner(dls, model, loss_func=CrossEntropyLossFlat(),
metrics=[accuracy, error_rate], cbs=MixUp)
learn.fit_one_cycle(5, 3e-3)
Training in this way, without explicitly telling the model that the labels must be biggen than 0 but smaller than 1, makes our activations more extreme as we train for more epochs. That is the reason why we will use label smoothing to deal with this.
Rethinking the Inception Architecture for Computer Vision: https://arxiv.org/abs/1512.00567
Instead of using regular one-hot-encoded vectors for the targets, we should use targets in the following format:
[0.1, 0.9]
This helps we do not encourage the model to predict something overconfidently.
model = resnet34()
learn = Learner(dls, model, loss_func=LabelSmoothingCrossEntropy(),
metrics=[accuracy, error_rate], cbs=MixUp)
learn.fit_one_cycle(5, 3e-3)
Normally we see improvements both from the MixUp and the Label Smoothing technique when we train the model for more epochs.