Imports


This article is also a Jupyter Notebook available to be run from the top down. There will be code snippets that you can then run in any environment.

Below are the versions of fastai, fastcore, and wwf currently running at the time of writing this:

  • fastai : 2.2.5
  • fastcore : 1.3.12
  • wwf : 0.0.16

import matplotlib.pyplot as plt
plt.style.use('dark_background')
from fastai.basics import *
import torch
from torch import nn

import numpy as np

import matplotlib.pyplot as plt

from fastai.torch_core import tensor

Linear Regression

  • Fit a line on 100 points
n = 100
x = torch.ones(n, 2)
len(x), x[:5]
(100,
 tensor([[1., 1.],
         [1., 1.],
         [1., 1.],
         [1., 1.],
         [1., 1.]]))

randomize in an uniform distribution from -1 to 1

x[:,0].uniform_(-1., 1)
x[:5], x.shape
(tensor([[0.6555, 1.0000],
         [0.0426, 1.0000],
         [0.2065, 1.0000],
         [0.4251, 1.0000],
         [0.9636, 1.0000]]),
 torch.Size([100, 2]))
  • Any linear model is y=mx+b
  • m, x, and b are matrices
  • We have x
m = tensor(3.,2); m, m.shape
(tensor([3., 2.]), torch.Size([2]))
  • b is a random bias
b = torch.rand(n); b[:5], b.shape
(tensor([0.4444, 0.2204, 0.3399, 0.5224, 0.1004]), torch.Size([100]))

Now we can make our y

  • Matrix multiplication is denoted with @
y = x@m + b

We'll know if we got a size wrong if:

m@x + b
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-11-ac53957f9814> in <module>
----> 1 m@x + b

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x2 and 100x2)

Plot our results

plt.scatter(x[:,0], y)
<matplotlib.collections.PathCollection at 0x7f102872cd90>

Our weights from last lesson should minimize the distance between points and our line.

  • mean squared error: Take distance from pred and y, square, then average
def mse(y_hat, y): return ((y_hat-y)**2).mean()

When we run our model, we are trying to predict m

For example, say a = (0.5, 0.75).

  • Make a prediction
  • Calculate the error
a = tensor(.5, .75)

Make prediction

y_pred = x@a

Calculate error

mse(y_pred, y)
tensor(5.2860)
plt.scatter(x[:,0],y)
plt.scatter(x[:,0],y_pred)
<matplotlib.collections.PathCollection at 0x7f0f4da4a430>

Model doesn't seen to quite fit. What's next? Optimization

Walking down Gradient Descent

  • Goal: Minimize the loss function (mse)
  • Gradient Descent:
    • Starts with parameters
    • Moves towards new parameters to minimize the function
    • Take steps in the negative direction of gradient function
a = nn.Parameter(a); a
Parameter containing:
tensor([0.5000, 0.7500], requires_grad=True)

Next let's create an update function to check if the current a improved. If so, move even closer.

We'll print out every 10 iterations to see how we are doing

def update():
    y_hat = x@a
    loss = mse(y, y_hat)
    if i % 10 == 0: print(loss)
    loss.backward()
    with torch.no_grad():
        a.sub_(lr * a.grad)
        a.grad.zero_()
  • torch.no_grad: No back propogation (no updating of our weights)
  • sub_: Subtracts some value (lr * our gradient)
  • grad.zero_: Zeros our gradients
lr = 1e-1
for i in range(100): update()
tensor(5.2860, grad_fn=<MeanBackward0>)
tensor(0.5746, grad_fn=<MeanBackward0>)
tensor(0.1852, grad_fn=<MeanBackward0>)
tensor(0.0990, grad_fn=<MeanBackward0>)
tensor(0.0781, grad_fn=<MeanBackward0>)
tensor(0.0730, grad_fn=<MeanBackward0>)
tensor(0.0717, grad_fn=<MeanBackward0>)
tensor(0.0714, grad_fn=<MeanBackward0>)
tensor(0.0713, grad_fn=<MeanBackward0>)
tensor(0.0713, grad_fn=<MeanBackward0>)

Now let's see how this new a compares.

  • Detach removes all gradients
plt.scatter(x[:,0],y)
plt.scatter(x[:,0], (x@a).detach())
plt.scatter(x[:,0],y_pred)
<matplotlib.collections.PathCollection at 0x7f0f4da29c40>

We fit our line much better here

Animate the process

from matplotlib import animation, rc
rc('animation', html='jshtml')
a = nn.Parameter(tensor(0.5, 0.75)); a
Parameter containing:
tensor([0.5000, 0.7500], requires_grad=True)
def animate(i):
    update()
    line.set_ydata((x@a).detach())
    return line,
fig = plt.figure()
plt.scatter(x[:,0], y, c='orange')
line, = plt.plot(x[:,0], (x@a).detach())
plt.close()
animation.FuncAnimation(fig, animate, np.arange(0,100), interval=20)
</input>

Ideally we split things up into batches of data to fit, and then work with all those batches (else we'd run out of memory!

If this were a classification problem, we would want to use Cross Entropy Loss, where we penalize incorrect confident predictions along with correct unconfident predictions. It's also called negative loss likelihood