In the previous post I've explained what is the most important concept in neural networks - technique that allows us to incrementally find minimum of a function. This is called Gradient Descent algorithm!

In this post I will build on this concept and show you how to create a basic linear model to predict what is on a medical image!

Download data

As in the first post showing how to build medical image recognition with pure statistics, we need to download data first. For some basic description of how the data looks like, see that first post first ;)

! git clone
! rm -rf ./medical_mnist
! mv Medical-MNIST-Classification/resized/ ./medical_mnist
! rm -rf Medical-MNIST-Classification
Cloning into 'Medical-MNIST-Classification'...
remote: Enumerating objects: 58532, done.
remote: Total 58532 (delta 0), reused 0 (delta 0), pack-reused 58532
Receiving objects: 100% (58532/58532), 77.86 MiB | 17.62 MiB/s, done.
Resolving deltas: 100% (506/506), done.
Checking connectivity... done.
Checking out files: 100% (58959/58959), done.
from pathlib import Path

PATH = Path("medical_mnist/")

We have much more powerful tools now, so we will deal with all 6 classes now, but first need to prepare data:

  • all data needs to be numerical
  • it needs to be in arrays
  • it needs to be labeled
classes = [ for cls in PATH.iterdir()]
['AbdomenCT', 'BreastMRI', 'CXR', 'ChestCT', 'Hand', 'HeadCT']

Prepare data

The plan is to

images = {}
for cls in classes:
    images[cls] = list((PATH/cls).iterdir())
from PIL import Image
import torch
from torchvision.transforms import ToTensor

image_tensors = {}

for cls in classes:
    image_tensors[cls] = torch.stack( # converts iterable of tensors into higher dimention single tensor
            ToTensor()( # converts images to tensors
            ).view(-1, 64 * 64).squeeze().float()/255 # reshape tensor from 64x64 to vector tensor of size 4096 and convert values
            for path in (PATH/cls).iterdir()]

so let's see what we got there

for cls in classes:
    class_shape = image_tensors[cls].shape
    print(f"{cls} has {class_shape[0]} images of a size {class_shape[1:]}")
AbdomenCT has 10000 images of a size torch.Size([4096])
BreastMRI has 8954 images of a size torch.Size([4096])
CXR has 10000 images of a size torch.Size([4096])
ChestCT has 10000 images of a size torch.Size([4096])
Hand has 10000 images of a size torch.Size([4096])
HeadCT has 10000 images of a size torch.Size([4096])
x_train =[image_tensors[cls] for cls in classes], dim=0)
y_train =[torch.tensor([index] * image_tensors[cls].shape[0]) for index, cls in enumerate(classes)])

shuffle the dataset. This is important as if we don't do this, images from the classes that we train first will be effectively not as "fresh" in the memmory of the network

permutations = torch.randperm(x_train.shape[0])
x_train = x_train[permutations]
y_train = y_train[permutations]

create validation set that is 20% of the training set - this is important to asses the performance of the model. Validation set doesn't take part in training, so model is not biased towards those images (it cannot remember those exact images from training). This is essential to see how well model generalizes on examples it has not seen.

valid_pct = 0.2
valid_index = int(x_train.shape[0] * valid_pct)
x_valid = x_train[:valid_index]
y_valid = y_train[:valid_index]

x_train = x_train[valid_index:]
y_train = y_train[valid_index:]
x_train.shape, y_train.shape, x_valid.shape, y_valid.shape
(torch.Size([47164, 4096]),
 torch.Size([11790, 4096]),


Now we need to build the model. Let's use the most basic building blocks of the neural network - linear layer (linear_layer function) and nonlinearity (softmax function).

def softmax(x):
    return x - x.exp().sum(-1).unsqueeze(-1)
def linear_layer(x):
    return x @ weights + bias

the model is a simple function composition (results of linear layer and fed into nonlinear layer (softmax)

def model(x):
    return softmax(linear_layer(x))

For the Gradient Descent to work we also need to specify the loss function - this is crucial, as this is the function on which we compute gradients for our parameters. Just a quick recap: Gradient Descent algorithm finds out the values to change function parameters so the function values decreese.

In your case we minimize loss_func. Parameters of this function (passed in a form of preds) are in the model: wegiths and bias. Gradient Descent will give us values to change each of those parameters so we minimize the loss_func

def loss_func(preds, targets):
    return -preds[range(targets.shape[0]), targets].mean()

def accuracy(preds, targets):
    return (torch.argmax(preds, dim=-1) == targets).float().mean()

And here is the Gradient Descent loop - see comments in the code for details


# number of training examples
n  = x_train.shape[0] 
# batch size - this is necessary as we won't be able to fit all
# the examples into the memmory, so we need to do the computations in batches
bs = 64 
# how many epochs to train for
epochs = 15  
weights = torch.zeros((64 * 64, 10), requires_grad=True) # define weights matrix
bias = torch.zeros(10, requires_grad=True) # and bias term

# in each of those epochs algorithm sees all the images. So in this case
# we see all the images 15 times
for epoch in range(epochs): 
    # here is the loop for batches: in each batch we:
    #  - see 64 images
    #  - compute predictions based on the model
    #  - compute the loss
    #  - compute gradients and update parameters (wegiths and bias)
    for i in range((n - 1) // bs + 1):
        # select images for this batch
        start_i = i * bs
        end_i = start_i + bs
        xb = x_train[start_i:end_i]
        yb = y_train[start_i:end_i]
        # compute predictions
        preds = model(xb)
        # compute loss
        loss = loss_func(preds, yb)

        # compute gradients (this is done for us by PyTorch with this backwards function!)
        # this block is necessary, so computations we do below, are not taken into account when
        # computing next gradients
        with torch.no_grad():
            # update parameters
            weights -= weights.grad
            bias -= bias.grad
            # zero out the gradients so they are ready for the next batch (otherwise they accumulate values)
    # eventually after each epoch (seeing all the images) we print out how we did
    print(f"Epoch {epoch} accuracy: {accuracy(model(x_valid),y_valid)}%, loss: {loss_func(model(x_valid), y_valid)}")
Epoch 0 accuracy: 0.7695504426956177%, loss: 2.5434305667877197
Epoch 1 accuracy: 0.787277340888977%, loss: 2.3623108863830566
Epoch 2 accuracy: 0.7912637591362%, loss: 2.232290267944336
Epoch 3 accuracy: 0.8318066000938416%, loss: 2.134671926498413
Epoch 4 accuracy: 0.9314673542976379%, loss: 2.058687448501587
Epoch 5 accuracy: 0.9561492800712585%, loss: 1.9977604150772095
Epoch 6 accuracy: 0.9601356983184814%, loss: 1.9477065801620483
Epoch 7 accuracy: 0.9603053331375122%, loss: 1.9057585000991821
Epoch 8 accuracy: 0.9602205157279968%, loss: 1.8700224161148071
Epoch 9 accuracy: 0.9598812460899353%, loss: 1.8391577005386353
Epoch 10 accuracy: 0.9594571590423584%, loss: 1.8121910095214844
Epoch 11 accuracy: 0.9597116112709045%, loss: 1.788395881652832
Epoch 12 accuracy: 0.9596267938613892%, loss: 1.7672215700149536
Epoch 13 accuracy: 0.9598812460899353%, loss: 1.7482411861419678
Epoch 14 accuracy: 0.9598812460899353%, loss: 1.7311174869537354
CPU times: user 1min 5s, sys: 173 ms, total: 1min 5s
Wall time: 9.22 s

What did we achieve?

With this simplest neural network we got to almost 96% accuracy - this is pretty good.