Deep learning is so simple that it is hard to believe. Especially with how much magical it seems at first. At it's core is a piece of code that allows, one step at a time, to be just a bit closer to the solution. You can think of it like the way of finding the top of the mountain:



  • step - single step of the optimization algorithm that makes parameters of a function just a bit better
  • gradient - the slope of the loss function, that we want to minimize
  • learning rate - how fast we adjust parameters based on the gradient

Task: Find minimum for $f: a x^2 + b$ with Gradient Descent

We have single variable here: $x$. With Gradient Descent we'll find the value where $f$ is minimal.

In short, gradient descent is an algorithm that walks down the slope - no more no less.

An example function $f$ in python looks like this:

def f(x):
    return 1.2 * x**2 + 4

To compute gradient at where we are now, we will use PyTorch magic:

import torch

x = torch.tensor(-3.).requires_grad_()

To walk along the function, we will change $x$ by the fraction of the gradient (gradient * learning_rate)

learning_rate = 0.2

and this is gradient descent algorightm in code:

def step(x, f):
    # compute y at a given value of x
    y = f(x)
    # the backward function of PyTorch tensor computes gradients
    # change value of x in a direction where function decreases
    # (don't bother about and x.grad = None - it is just necessary boilerplate) = x - learning_rate *
    x.grad = None
    return x

gather x-es and y-s for 10 iterations

xypairs = []
for i in range(10):
    xypairs.append([, f(x)])
    print(f"f({x:.2}) = {f(x):.2}")
    x = step(x, f)
f(-3.0) = 1.5e+01
f(-1.6) = 6.9
f(-0.81) = 4.8
f(-0.42) = 4.2
f(-0.22) = 4.1
f(-0.11) = 4.0
f(-0.059) = 4.0
f(-0.031) = 4.0
f(-0.016) = 4.0
f(-0.0083) = 4.0

And believe it or not, this is the building block of each the Neural Net - gradient descent algirithm. Stay tuned for the next post where we will use it on a MedicalNist dataset.