The building blocks of DL 1
Gradient Descent
Deep learning is so simple that it is hard to believe. Especially with how much magical it seems at first. At it's core is a piece of code that allows, one step at a time, to be just a bit closer to the solution. You can think of it like the way of finding the top of the mountain:
Vocabulary:
- step - single step of the optimization algorithm that makes parameters of a function just a bit better
- gradient - the slope of the loss function, that we want to minimize
- learning rate - how fast we adjust parameters based on the gradient
def f(x):
return 1.2 * x**2 + 4
To compute gradient at where we are now, we will use PyTorch magic:
import torch
x = torch.tensor(-3.).requires_grad_()
To walk along the function, we will change $x$ by the fraction of the gradient (gradient * learning_rate
)
learning_rate = 0.2
and this is gradient descent algorightm in code:
def step(x, f):
# compute y at a given value of x
y = f(x)
# the backward function of PyTorch tensor computes gradients
y.backward()
# change value of x in a direction where function decreases
# (don't bother about x.data and x.grad = None - it is just necessary boilerplate)
x.data = x - learning_rate * x.grad.data
x.grad = None
return x
gather x-es and y-s for 10 iterations
xypairs = []
for i in range(10):
xypairs.append([x.data, f(x)])
print(f"f({x:.2}) = {f(x):.2}")
x = step(x, f)
And believe it or not, this is the building block of each the Neural Net - gradient descent algirithm. Stay tuned for the next post where we will use it on a MedicalNist dataset.