Convolutional Princess Classifier

Training a Neural Network to Classify Disney Princesses with PyTorch

The inspiration for this project came from my daughter, who was curious if a computer could classify the name of a movie character based on a picture. The image classifier is a typical usecase for convolutional neural networks such as VGG19, and by training a classifier on images from various films, the machine should be able to correctly classify the images it is presented with.

I recently finished Udacity's excellent Intro to Machine Learning with PyTorch class, and I used this project as a way of practicing some of the skills I learned in that course.

In [1]:
# Package Imports
import os
import numpy as np
import random
import torch
import torchvision
from torch.utils.data.dataset import Dataset
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision.datasets import ImageFolder
import matplotlib.pyplot as plt
import numpy as np
from torch import nn
import torchvision.models as models
from torch import optim
from PIL import Image
import json
import os
from torch.nn import NLLLoss
from torch import optim
import time
import pandas as pd


use_gpu = True

Gathering Data

Creating a corpus of princess images from a video is a binary classification problem. A frame either contains the princess, or it does not. A classifier can therefore be trained on a small dataset to sort a larger one.

The basic sequence of this experiment is be as follows:

Step 1: Extract a frame of video every 5 seconds. The Little Mermaid (1989) is 85 minutes long or 5100 seconds of footage.

Step 2: Train a classifier on images that either contain Ariel or not. Images will be randomly selected from the pool and added to the training and validation sets. These will classified as "Ariel" or "not Ariel". Once the classifier has trained on the dataset to determine the probability that an image contains Ariel or not, it will be used to filter the rest of the images.

Step 3: Repeat this process for all films to generate a corpus of images for all princesses.

Step 4: Train a multi-class classifier on the new corpus of all princesses. The data will be randomly split into training and validation sets.

Step 5: Given a princess, generate a list of the top-k probabilities and name of the princess identified.

How does one extract frames from a video? I used the VLC command line interface:

vlc --rate 5.0 "video.mp4" --video-filter=scene --vout=dummy --scene-ratio=120 --scene-path="frame_dir" vlc://quit

This opens the video at path "video.mp4" and runs it at 10x speed outputting the frames to folder "frame_dir." Scene ratio is set to 24, this will save a frame every second assuming the video has 24 frames per second playback. If you want to watch the movie, just leave the rate and vout options out. Warning: if you set the rate too high, you may get tearing of frames.

Splitting the data into training and validation sets

The first task was to write a script that would randomly place the extracted frames into the training, validation, and test folders.

The idea here is to train the classifier to remove images that are not of Ariel so we don't have to hand pick all our images. Therefore, only 1/4 of the original data will be used for training and 1/4 will be used for validation. These will be hand-sorted into two folders. These folders will establish our ground truth of 'Ariel' or 'Not Ariel.' Once the classifier is able to confidently sort images as Ariel or not, an acceptable confidence will be chosen and the classifier will be run on the remaining test set, sorting the images into Ariel and Not Ariel. Then these results will be analyzed. For this experiment, false positives (incorrectly classifying images that are 'Not Ariel' as 'Ariel') are less desirable than false negatives (incorrectly classifying images that are 'Ariel' as 'Not Ariel.' If a few images of Ariel do not end up in our finished corpus, that would be better than training the model on untrue features.

For the purposes of classifying these images, I included all images that Ariel was in, even if she wasn't full frame or even facing the 'camera.' If this failed to produce a successful model, I planned to do more pruning of the dataset to eliminate frames where Ariel less visible.

The split yielded 66 frames with Ariel and 155 frames without in the training set and 61 frames with Ariel in the validation set and 135 without.

In [2]:
%matplotlib inline
l = plt.imread("data/the_little_mermaid/valid/1/scene60481.png")
r = plt.imread("data/the_little_mermaid/train/0/scene81721.png")
fig, ax = plt.subplots(1,2,figsize=(20,10))

ax[0].imshow(l)
ax[0].axis("off")
ax[1].imshow(r)
ax[1].axis("off")
Out[2]:
(-0.5, 1919.5, 1089.5, -0.5)

Worth noting is that there are two cases of transformation that could confound the classifier:

Ariel transforms from mermaid to human, which alters her appearance.

Ursula transforms into Ariel's 'evil twin' (albeit with darker hair). This could also throw off the model's ability to classify images correctly.

In [3]:
film = "the_little_mermaid"
film_dir = os.path.join("data", film)

def train_test_split(data_dir):
    train_dir = data_dir + '/train'
    valid_dir = data_dir + '/valid'
    test_dir = data_dir + '/test'
    
    files = np.asarray(os.listdir(data_dir))
    if not os.path.isdir(train_dir):
        os.mkdir(train_dir)

    if not os.path.isdir(valid_dir):
        os.mkdir(valid_dir)

    if not os.path.isdir(test_dir):
        os.mkdir(test_dir)

    for file in files:    
        if file.endswith(".png"):
            d_ten = random.randint(1, 4)
            if d_ten == 1:
                # put it in the training folder
                os.rename(os.path.join(data_dir, file), os.path.join(train_dir, file))

            if d_ten == 2:
                # put it in the validation folder
                os.rename(os.path.join(data_dir, file), os.path.join(valid_dir, file))

            if d_ten > 2:
                os.rename(os.path.join(data_dir, file), os.path.join(test_dir, file))
In [4]:
train_test_split(film_dir)

Loading the data

The training data will be randomly modified to improve the model's ability to generalize.

In [2]:
def load_data(film_dir):
    train_dir = film_dir + '/train'
    valid_dir = film_dir +'/valid'

    xforms = {
        'train': transforms.Compose([
            transforms.RandomRotation(30),
            transforms.RandomResizedCrop(224),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406],
                                 [0.229, 0.224, 0.225])
        ]),
        'validate': transforms.Compose([
            transforms.Resize(255),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406],
                                 [0.229, 0.224, 0.225])
        ])
    }

    # Data loading

    image_datasets = {
        'train': ImageFolder(train_dir, transform=xforms['train']),
        'validate': ImageFolder(valid_dir, transform=xforms['validate']),
    }


    # Data batching
    dataloaders = {
        'train': DataLoader(image_datasets['train'], batch_size=64, shuffle=True),
        'validate': DataLoader(image_datasets['validate'], batch_size=64, shuffle=True),
    }
    
    return dataloaders, image_datasets
In [6]:
dataloaders, image_datasets = load_data(film_dir)

Label mapping

Data will be sorted into two folders, numbered 0 and 1. Folder 0 will contain 'Not Ariel' and Folder 1 will contain 'Ariel.' This information will be stored in a json file called 'class_dict.json'.

In [3]:
def map_labels(data_dir):

    with open(os.path.join(film_dir, 'class_dict.json'), 'r') as f:
        class_dict = json.load(f)

    return class_dict
In [8]:
class_dict = map_labels(film_dir)
In [9]:
num_of_classes = len(next(os.walk(film_dir + '/train'))[1])

num_of_classes
Out[9]:
2

Transfer learning

In this experiment, I used a pretrained model called VGG19. VGG19 is a convolutional neural network trained on a very large set of images. I kept this model's features static, but replaced its classifier and trained it to recognize images of Ariel. Using a pretrained model has the advantage of being able to recognize features with a much smaller set of data. This is based on the idea that images contain many similar types of features that can be 'transferred' to a new model. One potential downside of this approach is that VGG19 was trained on photographs, not hand-drawn animation.

Classifier structure

The classifier has 25088 in_features, 4096 hidden units, and outputs 1000 features. This is because it is intended to classify 1000 different images. For our purposes, we want the classifier to have 2 out_features: one for Ariel and one for not Ariel. Eventually, we will have a number of out_features equal to the number of characters we want to classify.

A few other things to note, the classifier has a Rectified Linear Unit layer and a Dropout layer between its linear layers. In our custom classifier, we will also add a LogSoftmax layer to this basic structure. This is because we will be using the negative log loss function as our criterion. A classifier's output and its criterion must be considered as a package. It is important that the correct output layer is chosen to match the criterion's expected input or the loss function will not work correctly.

In [5]:
model = models.vgg19(pretrained=True)

# this freezes the model's features so we are only training the classifier
for param in model.parameters():
    param.requires_grad = False

input_features = 25088
hidden_units = 4096
out_features = 2

model.classifier = nn.Sequential(
    nn.Linear(input_features, hidden_units),
    nn.ReLU(inplace=True),
    nn.Dropout(p=0.5, inplace=False),
    nn.Linear(hidden_units, hidden_units),
    nn.ReLU(inplace=True),
    nn.Dropout(p=0.5, inplace=False),
    nn.Linear(hidden_units, out_features),
    nn.LogSoftmax(dim=1)
)

# learning rate
alpha = 0.0001

optimizer = optim.Adam(model.classifier.parameters(), lr=alpha)

criterion = NLLLoss()

train_losses = []
validate_losses = []

def train(epochs):
    
    model.train()
    # activate the gpu
    if use_gpu:
        model.cuda()

    steps = 0
    t0 = time.time()

    for e in range (epochs):

        running_loss = 0
        
        for X, y in dataloaders['train']:
            steps += 1
            
            # move features to gpu
            if use_gpu:
                X, y = X.to('cuda'), y.to('cuda')

            optimizer.zero_grad()

            output = model.forward(X)
            loss = criterion(output, y)
            loss.backward()
            optimizer.step()
            
            # a timer
            running_loss += loss.item()
            t2 = time.time()
            percent_complete = steps/ (len(dataloaders['train']) * epochs)
            time_elapsed = t2-t0
            estimated_time_left = time_elapsed/percent_complete - time_elapsed

            print("{}% complete. {}m{}s elapsed. {}m{}s left.".format(round(percent_complete * 100), 
                                                              *divmod(int(time_elapsed), 60), 
                                                              *divmod(int(estimated_time_left), 60)), end='\r')
        else:
            # Validation Loss and Accuracy
            validate_loss = 0
            
            global accuracy
            accuracy = 0

            with torch.no_grad():
                model.eval()
                for X, y in dataloaders['validate']:

                    if use_gpu:
                        X, y = X.to('cuda'), y.to('cuda')

                    log_probabilities = model.forward(X)
                    validate_loss += criterion(log_probabilities, y).item()

                    # we can convert the log probabilities to probabilities using torch.exp
                    # torch.exp is e^x
                    probabilities = torch.exp(log_probabilities)

                    # returns true if the label is the highest probablility class predicted by the model 
                    equality = (y.data == probabilities.max(dim=1)[1])

                    # .type?
                    accuracy += equality.type(torch.FloatTensor).mean()

            model.train()

            train_losses.append(running_loss/len(dataloaders['train']))
            validate_losses.append(validate_loss/len(dataloaders['validate']))

            print("Epoch: {}/{}..".format(e+1, epochs),
                  "Training Loss: {:.3f}.. ".format(running_loss/len(dataloaders['train'])),
                  "Validate Loss: {:.3f}.. ".format(validate_loss/len(dataloaders['validate'])),
                  "Validate Accuracy: {:.3f}".format(accuracy/len(dataloaders['validate'])))
        
In [12]:
train(10)
Epoch: 1/10.. Training Loss: 0.720..  Validate Loss: 0.566..  Validate Accuracy: 0.762
Epoch: 2/10.. Training Loss: 0.611..  Validate Loss: 0.545..  Validate Accuracy: 0.781
Epoch: 3/10.. Training Loss: 0.551..  Validate Loss: 0.498..  Validate Accuracy: 0.777
Epoch: 4/10.. Training Loss: 0.521..  Validate Loss: 0.542..  Validate Accuracy: 0.723
Epoch: 5/10.. Training Loss: 0.449..  Validate Loss: 0.740..  Validate Accuracy: 0.660
Epoch: 6/10.. Training Loss: 0.486..  Validate Loss: 0.518..  Validate Accuracy: 0.746
Epoch: 7/10.. Training Loss: 0.498..  Validate Loss: 0.666..  Validate Accuracy: 0.734
Epoch: 8/10.. Training Loss: 0.452..  Validate Loss: 0.729..  Validate Accuracy: 0.680
Epoch: 9/10.. Training Loss: 0.502..  Validate Loss: 0.544..  Validate Accuracy: 0.789
Epoch: 10/10.. Training Loss: 0.383..  Validate Loss: 0.587..  Validate Accuracy: 0.762
In [13]:
%matplotlib inline
fig, ax = plt.subplots(facecolor=('1'))
ax.set_facecolor('1')
ax.set_title('Loss vs time', color='0')
ax.set_xlabel('epochs', color='1')
ax.set_ylabel('average loss', color='1')
ax.plot(train_losses, '#009d8c', label="training losses")
ax.plot(validate_losses, '#d94f45', label="validate losses")
ax.legend(frameon=True)
ax.tick_params(labelcolor='0')
In [6]:
def save(path):
    model.class_to_idx = image_datasets['train'].class_to_idx

    checkpoint = {'model_state_dict': model.state_dict(),
                  'optimizer_state_dict': optimizer.state_dict(),
                  'classifier': model.classifier,
                  'class_to_idx': model.class_to_idx,
                  'alpha': alpha
                 }

    torch.save(checkpoint, path)
In [15]:
# MAKE SURE THIS IS COMMENTED OUT IF YOU AREN'T TRAINING ARIEL OR YOU WILL WIPE HER CHECKPOINT

save(os.path.join(film_dir,'checkpoint.pth'))
/home/we/miniconda3/lib/python3.7/site-packages/torch/serialization.py:292: UserWarning: Couldn't retrieve source code for container of type Sequential. It won't be checked for correctness upon loading.
  "type " + obj.__name__ + ". It won't be checked "
/home/we/miniconda3/lib/python3.7/site-packages/torch/serialization.py:292: UserWarning: Couldn't retrieve source code for container of type Linear. It won't be checked for correctness upon loading.
  "type " + obj.__name__ + ". It won't be checked "
/home/we/miniconda3/lib/python3.7/site-packages/torch/serialization.py:292: UserWarning: Couldn't retrieve source code for container of type ReLU. It won't be checked for correctness upon loading.
  "type " + obj.__name__ + ". It won't be checked "
/home/we/miniconda3/lib/python3.7/site-packages/torch/serialization.py:292: UserWarning: Couldn't retrieve source code for container of type Dropout. It won't be checked for correctness upon loading.
  "type " + obj.__name__ + ". It won't be checked "
/home/we/miniconda3/lib/python3.7/site-packages/torch/serialization.py:292: UserWarning: Couldn't retrieve source code for container of type LogSoftmax. It won't be checked for correctness upon loading.
  "type " + obj.__name__ + ". It won't be checked "
In [7]:
# Image Processing

def process_image(image):

    image = Image.open(image)
    
    # Process a PIL image for use in a PyTorch model
    xform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    
    return xform(image)
In [8]:
def imshow(image, ax=None, title=None):

    if ax is None:
        fig, ax = plt.subplots()
    
    # PyTorch tensors assume the color channel is the first dimension
    # but matplotlib assumes is the third dimension
    image = image.numpy().transpose((1, 2, 0))
    
    # Undo preprocessing
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    image = std * image + mean
    
    # Image needs to be clipped between 0 and 1 or it looks like noise when displayed
    image = np.clip(image, 0, 1)
    ax.imshow(image)
    ax.axis('off')
    return ax

imshow(process_image("data/the_little_mermaid/valid/1/scene47521.png"))
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f0bd358cdd0>
In [9]:
def load_from_checkpoint(checkpoint_path):

    if use_gpu:
        device = torch.device("cuda")
        checkpoint = torch.load(checkpoint_path, map_location="cuda:0")
    else:
        device = torch.device("cpu")
        checkpoint = torch.load(checkpoint_, map_location=device)
    
    # reconstruct model and custom classifier from checkpoint
    model = models.vgg19(pretrained=True)

    # freeze vgg features
    # https://stackoverflow.com/questions/51748138/pytorch-how-to-set-requires-grad-false
    for param in model.parameters():
        param.requires_grad = False

    model.classifier = checkpoint['classifier']

    model.load_state_dict(checkpoint['model_state_dict'])

    model.to(device)

    # reconstruct optimizer from checkpoint
    optimizer = optim.Adam(model.classifier.parameters(), lr=checkpoint['alpha'])
    optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
    
    # load {class : index} dictionary
    model.class_to_idx = checkpoint['class_to_idx']
    
    # invert dictionary for {index : class}
    # https://therenegadecoder.com/code/how-to-invert-a-dictionary-in-python/
    return model, optimizer

def predict(image_path, model, topk=1):
    idx_to_class = dict(zip(model.class_to_idx.values(), model.class_to_idx.keys()))
   
    
    image = process_image(image_path)
    image = image.unsqueeze_(0)
    
    # put model into eval mode for predictions
    model.eval()
    
    # Implement the code to predict the class from an image file
    with torch.no_grad():
        if use_gpu:
            log_probabilities = model.forward(image.cuda())
        else:
            log_probabilities = model.forward(image)
        
    probabilities = torch.exp(log_probabilities)
    topks = probabilities.topk(topk)
    idxs = topks.indices.tolist()[0]
    predicted_classes = [idx_to_class[i] for i in idxs]
        
    return topks.values, predicted_classes
In [19]:
cp = os.path.join(film_dir, "checkpoint.pth")
model, optimizer = load_from_checkpoint(cp)
P, classes = predict("data/the_little_mermaid/valid/1/scene47521.png", model, topk=1)
In [20]:
P
Out[20]:
tensor([[0.9670]], device='cuda:0')
In [21]:
classes
Out[21]:
['1']
In [24]:
def test_sort(film_dir, cutoff=0.5):
    
    checkpoint_path = os.path.join(film_dir, "checkpoint.pth")
    
    test_dir = film_dir + '/test'
    if not os.path.isdir(test_dir):
        os.mkdir(test_dir)
    files = np.asarray(os.listdir(test_dir))

    if not os.path.isdir(test_dir + '/1'):
        os.mkdir(os.path.join(test_dir + '/1'))

    if not os.path.isdir(test_dir + '/0'):
        os.mkdir(test_dir + '/0')

    model, optimizer = load_from_checkpoint(checkpoint_path)

    for file in files:
        if file.endswith(".png"):
            P, classes = predict(os.path.join(test_dir, file), model, topk=1)

            if classes[0] == '1' and P[0] > cutoff:
                os.rename(os.path.join(test_dir, file), os.path.join(test_dir + '/1', file))
            else:
                os.rename(os.path.join(test_dir, file), os.path.join(test_dir + '/0', file))

Data Collection

Now the data processing steps are repeated for each film to create the training, validation, and testing sets. Then the training and validation sets are split manually. Finally, a classifier is trained for each film to split the testing set.

In [11]:
films = ["snow_white", "cinderella", "sleeping_beauty", "the_little_mermaid", "beauty_and_the_beast", "aladdin", "pocahontas", "mulan", "the_princess_and_the_frog", "tangled", "brave", "frozen"]

def split_all_film_data():
    for film in films:
        path = os.path.join('data', film)
        train_test_split(path)
In [12]:
#split_all_film_data()

Data

In [13]:
d = {'train_pos': [31, 40, 28, 68, 49, 36, 61, 66, 70, 96, 76, 46], 
     'train_neg': [164, 176, 147, 155, 156, 196, 143, 169, 183, 201, 171, 219],
     'valid_pos': [43, 40, 37, 61, 49, 36, 61, 66, 70, 97, 87, 60],
     'valid_neg': [156, 186, 187, 135, 184, 230, 151, 145, 210, 168, 177, 261],
     'color1': ["#0058a2", "#f8aadc", "#f3ae47", "#d94f45", "#78492a", "#f9d938", "#00bce1", "#28346e", "#8fd18e", "#be84c6", "#f75620", "#fff6db"],
     'color2': ["#fff28c", "#7eafd0", "#bf4574", "#009d8c", "#ffda4d", "#45b5a9", "#d88745", "#89b74b", "#dceaa7", "#ecec96", "#025467", "#a9e4f9"]
    }

df = pd.DataFrame(data=d, index=films)

df
Out[13]:
train_pos train_neg valid_pos valid_neg color1 color2
snow_white 31 164 43 156 #0058a2 #fff28c
cinderella 40 176 40 186 #f8aadc #7eafd0
sleeping_beauty 28 147 37 187 #f3ae47 #bf4574
the_little_mermaid 68 155 61 135 #d94f45 #009d8c
beauty_and_the_beast 49 156 49 184 #78492a #ffda4d
aladdin 36 196 36 230 #f9d938 #45b5a9
pocahontas 61 143 61 151 #00bce1 #d88745
mulan 66 169 66 145 #28346e #89b74b
the_princess_and_the_frog 70 183 70 210 #8fd18e #dceaa7
tangled 96 201 97 168 #be84c6 #ecec96
brave 76 171 87 177 #f75620 #025467
frozen 46 219 60 261 #fff6db #a9e4f9
In [73]:
# Sanity Check

def sanity_check(path, top_k):

    P, classes = predict(path, model, topk=top_k)
    image = process_image(path)

    if use_gpu:
        probs = P.cpu().numpy()[0]
    else:
        probs = P.numpy()[0]

    names = [class_dict[i] for i in classes]
    
    colors = [df['color2'][int(i)] for i in classes]

    fig, axs = plt.subplots(2, figsize=(4, 8), facecolor=('1'))
    axs[0].set_title(names[0], color='0')
    axs[1].set_facecolor('1')
    imshow(process_image(path),axs[0])
    y_pos = np.arange(len(names))
    plt.yticks(y_pos, names)
    bar_graph = axs[1].barh(y_pos, probs, color=colors)
    plt.gca().invert_yaxis()
    axs[1].tick_params(labelcolor='0')
In [74]:
sanity_check("data/the_little_mermaid/valid/1/scene47521.png", 2)
In [30]:
def sample_dist_plot(pos, neg, title):
    ind = np.arange(len(pos))    # the x locations for the groups
    width = 0.6    # the width of the bars: can also be len(x) sequence

    p1 = plt.barh(ind, pos, width, color=df['color1'])
    p2 = plt.barh(ind, neg, width,
                 left=pos, color=df['color2'])
    plt.xlabel('Positive and negative samples by film')
    plt.title(title)
    plt.yticks(ind, films)
    plt.xticks(np.arange(0, 350, 25))

    plt.show()
In [31]:
sample_dist_plot(df['train_pos'], df['train_neg'], "Training set sample distribution")
In [32]:
sample_dist_plot(df['valid_pos'], df['valid_neg'], "Validation set sample distribution")
In [ ]:
for film in films:

    model.classifier = nn.Sequential(
        nn.Linear(input_features, hidden_units),
        nn.ReLU(inplace=True),
        nn.Dropout(p=0.5, inplace=False),
        nn.Linear(hidden_units, hidden_units),
        nn.ReLU(inplace=True),
        nn.Dropout(p=0.5, inplace=False),
        nn.Linear(hidden_units, out_features),
        nn.LogSoftmax(dim=1)
    )

    optimizer = optim.Adam(model.classifier.parameters(), lr=alpha)

    train_losses = []
    validate_losses = []

    # set path to film
    film_dir = os.path.join('data', film)

    # load the data
    dataloaders, image_datasets = load_data(film_dir)

    # map labels
    class_dict = map_labels(film_dir)

    # train the model
    train(10)
    
    # save the checkpoint
    save(os.path.join(film_dir,'checkpoint.pth'))

    #set checkpoint_path

    test_sort(film_dir, cutoff=0.6)

Training the Multi-Class Classifier

The training set is be comprised of images chosen by the previous binary classification model. False positives are removed and the validation set from the original experiment is used to test the results.

In [61]:
out_features = 12

model.classifier = nn.Sequential(
        nn.Linear(input_features, hidden_units),
        nn.ReLU(inplace=True),
        nn.Dropout(p=0.5, inplace=False),
        nn.Linear(hidden_units, hidden_units),
        nn.ReLU(inplace=True),
        nn.Dropout(p=0.5, inplace=False),
        nn.Linear(hidden_units, out_features),
        nn.LogSoftmax(dim=1)
)

optimizer = optim.Adam(model.classifier.parameters(), lr=alpha)

train_losses = []
validate_losses = []

# set path to film
film_dir = os.path.join('data', 'multi_class')

# load the data
dataloaders, image_datasets = load_data(film_dir)

# map labels
class_dict = map_labels(film_dir)

# train the model
train(20)
    
# save the checkpoint
save(os.path.join(film_dir,'checkpoint.pth'))
Epoch: 1/20.. Training Loss: 1.949..  Validate Loss: 1.824..  Validate Accuracy: 0.446
Epoch: 2/20.. Training Loss: 1.330..  Validate Loss: 1.468..  Validate Accuracy: 0.522
Epoch: 3/20.. Training Loss: 1.047..  Validate Loss: 1.290..  Validate Accuracy: 0.517
Epoch: 4/20.. Training Loss: 0.888..  Validate Loss: 1.176..  Validate Accuracy: 0.610
Epoch: 5/20.. Training Loss: 0.737..  Validate Loss: 1.413..  Validate Accuracy: 0.596
Epoch: 6/20.. Training Loss: 0.677..  Validate Loss: 1.229..  Validate Accuracy: 0.632
Epoch: 7/20.. Training Loss: 0.675..  Validate Loss: 1.078..  Validate Accuracy: 0.679
Epoch: 8/20.. Training Loss: 0.651..  Validate Loss: 1.008..  Validate Accuracy: 0.703
Epoch: 9/20.. Training Loss: 0.554..  Validate Loss: 0.974..  Validate Accuracy: 0.699
Epoch: 10/20.. Training Loss: 0.588..  Validate Loss: 0.831..  Validate Accuracy: 0.739
Epoch: 11/20.. Training Loss: 0.569..  Validate Loss: 0.916..  Validate Accuracy: 0.728
Epoch: 12/20.. Training Loss: 0.554..  Validate Loss: 1.116..  Validate Accuracy: 0.699
Epoch: 13/20.. Training Loss: 0.470..  Validate Loss: 1.014..  Validate Accuracy: 0.716
Epoch: 14/20.. Training Loss: 0.468..  Validate Loss: 0.944..  Validate Accuracy: 0.744
Epoch: 15/20.. Training Loss: 0.421..  Validate Loss: 1.123..  Validate Accuracy: 0.713
Epoch: 16/20.. Training Loss: 0.420..  Validate Loss: 0.981..  Validate Accuracy: 0.710
Epoch: 17/20.. Training Loss: 0.446..  Validate Loss: 0.930..  Validate Accuracy: 0.754
Epoch: 18/20.. Training Loss: 0.407..  Validate Loss: 1.157..  Validate Accuracy: 0.710
Epoch: 19/20.. Training Loss: 0.388..  Validate Loss: 1.021..  Validate Accuracy: 0.734
Epoch: 20/20.. Training Loss: 0.347..  Validate Loss: 0.917..  Validate Accuracy: 0.745
In [75]:
sanity_check("data/multi_class/valid/0/scene14641.png", 5)
In [76]:
sanity_check("data/multi_class/valid/11/scene75121.png", 5)
In [77]:
sanity_check("data/multi_class/valid/2/scene41641.png", 5)
In [78]:
sanity_check("data/multi_class/valid/5/scene78121.png", 5)
In [79]:
sanity_check("data/multi_class/valid/6/scene83401.png", 5)
In [80]:
sanity_check("data/multi_class/valid/8/scene81481.png", 5)
In [81]:
sanity_check("data/multi_class/valid/8/scene38041.png", 5)