Convolutional Princess Classifier
Mon 09 March 2020Training a Neural Network to Classify Disney Princesses with PyTorch¶
The inspiration for this project came from my daughter, who was curious if a computer could classify the name of a movie character based on a picture. The image classifier is a typical usecase for convolutional neural networks such as VGG19, and by training a classifier on images from various films, the machine should be able to correctly classify the images it is presented with.
I recently finished Udacity's excellent Intro to Machine Learning with PyTorch class, and I used this project as a way of practicing some of the skills I learned in that course.
# Package Imports
import os
import numpy as np
import random
import torch
import torchvision
from torch.utils.data.dataset import Dataset
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision.datasets import ImageFolder
import matplotlib.pyplot as plt
import numpy as np
from torch import nn
import torchvision.models as models
from torch import optim
from PIL import Image
import json
import os
from torch.nn import NLLLoss
from torch import optim
import time
import pandas as pd
use_gpu = True
Gathering Data¶
Creating a corpus of princess images from a video is a binary classification problem. A frame either contains the princess, or it does not. A classifier can therefore be trained on a small dataset to sort a larger one.
The basic sequence of this experiment is be as follows:
Step 1: Extract a frame of video every 5 seconds. The Little Mermaid (1989) is 85 minutes long or 5100 seconds of footage.
Step 2: Train a classifier on images that either contain Ariel or not. Images will be randomly selected from the pool and added to the training and validation sets. These will classified as "Ariel" or "not Ariel". Once the classifier has trained on the dataset to determine the probability that an image contains Ariel or not, it will be used to filter the rest of the images.
Step 3: Repeat this process for all films to generate a corpus of images for all princesses.
Step 4: Train a multi-class classifier on the new corpus of all princesses. The data will be randomly split into training and validation sets.
Step 5: Given a princess, generate a list of the top-k probabilities and name of the princess identified.
How does one extract frames from a video? I used the VLC command line interface:
vlc --rate 5.0 "video.mp4" --video-filter=scene --vout=dummy --scene-ratio=120 --scene-path="frame_dir" vlc://quit
This opens the video at path "video.mp4" and runs it at 10x speed outputting the frames to folder "frame_dir." Scene ratio is set to 24, this will save a frame every second assuming the video has 24 frames per second playback. If you want to watch the movie, just leave the rate and vout options out. Warning: if you set the rate too high, you may get tearing of frames.
Splitting the data into training and validation sets¶
The first task was to write a script that would randomly place the extracted frames into the training, validation, and test folders.
The idea here is to train the classifier to remove images that are not of Ariel so we don't have to hand pick all our images. Therefore, only 1/4 of the original data will be used for training and 1/4 will be used for validation. These will be hand-sorted into two folders. These folders will establish our ground truth of 'Ariel' or 'Not Ariel.' Once the classifier is able to confidently sort images as Ariel or not, an acceptable confidence will be chosen and the classifier will be run on the remaining test set, sorting the images into Ariel and Not Ariel. Then these results will be analyzed. For this experiment, false positives (incorrectly classifying images that are 'Not Ariel' as 'Ariel') are less desirable than false negatives (incorrectly classifying images that are 'Ariel' as 'Not Ariel.' If a few images of Ariel do not end up in our finished corpus, that would be better than training the model on untrue features.
For the purposes of classifying these images, I included all images that Ariel was in, even if she wasn't full frame or even facing the 'camera.' If this failed to produce a successful model, I planned to do more pruning of the dataset to eliminate frames where Ariel less visible.
The split yielded 66 frames with Ariel and 155 frames without in the training set and 61 frames with Ariel in the validation set and 135 without.
%matplotlib inline
l = plt.imread("data/the_little_mermaid/valid/1/scene60481.png")
r = plt.imread("data/the_little_mermaid/train/0/scene81721.png")
fig, ax = plt.subplots(1,2,figsize=(20,10))
ax[0].imshow(l)
ax[0].axis("off")
ax[1].imshow(r)
ax[1].axis("off")
Worth noting is that there are two cases of transformation that could confound the classifier:
Ariel transforms from mermaid to human, which alters her appearance.
Ursula transforms into Ariel's 'evil twin' (albeit with darker hair). This could also throw off the model's ability to classify images correctly.
film = "the_little_mermaid"
film_dir = os.path.join("data", film)
def train_test_split(data_dir):
train_dir = data_dir + '/train'
valid_dir = data_dir + '/valid'
test_dir = data_dir + '/test'
files = np.asarray(os.listdir(data_dir))
if not os.path.isdir(train_dir):
os.mkdir(train_dir)
if not os.path.isdir(valid_dir):
os.mkdir(valid_dir)
if not os.path.isdir(test_dir):
os.mkdir(test_dir)
for file in files:
if file.endswith(".png"):
d_ten = random.randint(1, 4)
if d_ten == 1:
# put it in the training folder
os.rename(os.path.join(data_dir, file), os.path.join(train_dir, file))
if d_ten == 2:
# put it in the validation folder
os.rename(os.path.join(data_dir, file), os.path.join(valid_dir, file))
if d_ten > 2:
os.rename(os.path.join(data_dir, file), os.path.join(test_dir, file))
train_test_split(film_dir)
Loading the data¶
The training data will be randomly modified to improve the model's ability to generalize.
def load_data(film_dir):
train_dir = film_dir + '/train'
valid_dir = film_dir +'/valid'
xforms = {
'train': transforms.Compose([
transforms.RandomRotation(30),
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
]),
'validate': transforms.Compose([
transforms.Resize(255),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
])
}
# Data loading
image_datasets = {
'train': ImageFolder(train_dir, transform=xforms['train']),
'validate': ImageFolder(valid_dir, transform=xforms['validate']),
}
# Data batching
dataloaders = {
'train': DataLoader(image_datasets['train'], batch_size=64, shuffle=True),
'validate': DataLoader(image_datasets['validate'], batch_size=64, shuffle=True),
}
return dataloaders, image_datasets
dataloaders, image_datasets = load_data(film_dir)
Label mapping¶
Data will be sorted into two folders, numbered 0 and 1. Folder 0 will contain 'Not Ariel' and Folder 1 will contain 'Ariel.' This information will be stored in a json file called 'class_dict.json'.
def map_labels(data_dir):
with open(os.path.join(film_dir, 'class_dict.json'), 'r') as f:
class_dict = json.load(f)
return class_dict
class_dict = map_labels(film_dir)
num_of_classes = len(next(os.walk(film_dir + '/train'))[1])
num_of_classes
Transfer learning¶
In this experiment, I used a pretrained model called VGG19. VGG19 is a convolutional neural network trained on a very large set of images. I kept this model's features static, but replaced its classifier and trained it to recognize images of Ariel. Using a pretrained model has the advantage of being able to recognize features with a much smaller set of data. This is based on the idea that images contain many similar types of features that can be 'transferred' to a new model. One potential downside of this approach is that VGG19 was trained on photographs, not hand-drawn animation.
Classifier structure¶
The classifier has 25088 in_features, 4096 hidden units, and outputs 1000 features. This is because it is intended to classify 1000 different images. For our purposes, we want the classifier to have 2 out_features: one for Ariel and one for not Ariel. Eventually, we will have a number of out_features equal to the number of characters we want to classify.
A few other things to note, the classifier has a Rectified Linear Unit layer and a Dropout layer between its linear layers. In our custom classifier, we will also add a LogSoftmax layer to this basic structure. This is because we will be using the negative log loss function as our criterion. A classifier's output and its criterion must be considered as a package. It is important that the correct output layer is chosen to match the criterion's expected input or the loss function will not work correctly.
model = models.vgg19(pretrained=True)
# this freezes the model's features so we are only training the classifier
for param in model.parameters():
param.requires_grad = False
input_features = 25088
hidden_units = 4096
out_features = 2
model.classifier = nn.Sequential(
nn.Linear(input_features, hidden_units),
nn.ReLU(inplace=True),
nn.Dropout(p=0.5, inplace=False),
nn.Linear(hidden_units, hidden_units),
nn.ReLU(inplace=True),
nn.Dropout(p=0.5, inplace=False),
nn.Linear(hidden_units, out_features),
nn.LogSoftmax(dim=1)
)
# learning rate
alpha = 0.0001
optimizer = optim.Adam(model.classifier.parameters(), lr=alpha)
criterion = NLLLoss()
train_losses = []
validate_losses = []
def train(epochs):
model.train()
# activate the gpu
if use_gpu:
model.cuda()
steps = 0
t0 = time.time()
for e in range (epochs):
running_loss = 0
for X, y in dataloaders['train']:
steps += 1
# move features to gpu
if use_gpu:
X, y = X.to('cuda'), y.to('cuda')
optimizer.zero_grad()
output = model.forward(X)
loss = criterion(output, y)
loss.backward()
optimizer.step()
# a timer
running_loss += loss.item()
t2 = time.time()
percent_complete = steps/ (len(dataloaders['train']) * epochs)
time_elapsed = t2-t0
estimated_time_left = time_elapsed/percent_complete - time_elapsed
print("{}% complete. {}m{}s elapsed. {}m{}s left.".format(round(percent_complete * 100),
*divmod(int(time_elapsed), 60),
*divmod(int(estimated_time_left), 60)), end='\r')
else:
# Validation Loss and Accuracy
validate_loss = 0
global accuracy
accuracy = 0
with torch.no_grad():
model.eval()
for X, y in dataloaders['validate']:
if use_gpu:
X, y = X.to('cuda'), y.to('cuda')
log_probabilities = model.forward(X)
validate_loss += criterion(log_probabilities, y).item()
# we can convert the log probabilities to probabilities using torch.exp
# torch.exp is e^x
probabilities = torch.exp(log_probabilities)
# returns true if the label is the highest probablility class predicted by the model
equality = (y.data == probabilities.max(dim=1)[1])
# .type?
accuracy += equality.type(torch.FloatTensor).mean()
model.train()
train_losses.append(running_loss/len(dataloaders['train']))
validate_losses.append(validate_loss/len(dataloaders['validate']))
print("Epoch: {}/{}..".format(e+1, epochs),
"Training Loss: {:.3f}.. ".format(running_loss/len(dataloaders['train'])),
"Validate Loss: {:.3f}.. ".format(validate_loss/len(dataloaders['validate'])),
"Validate Accuracy: {:.3f}".format(accuracy/len(dataloaders['validate'])))
train(10)
%matplotlib inline
fig, ax = plt.subplots(facecolor=('1'))
ax.set_facecolor('1')
ax.set_title('Loss vs time', color='0')
ax.set_xlabel('epochs', color='1')
ax.set_ylabel('average loss', color='1')
ax.plot(train_losses, '#009d8c', label="training losses")
ax.plot(validate_losses, '#d94f45', label="validate losses")
ax.legend(frameon=True)
ax.tick_params(labelcolor='0')
def save(path):
model.class_to_idx = image_datasets['train'].class_to_idx
checkpoint = {'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'classifier': model.classifier,
'class_to_idx': model.class_to_idx,
'alpha': alpha
}
torch.save(checkpoint, path)
# MAKE SURE THIS IS COMMENTED OUT IF YOU AREN'T TRAINING ARIEL OR YOU WILL WIPE HER CHECKPOINT
save(os.path.join(film_dir,'checkpoint.pth'))
# Image Processing
def process_image(image):
image = Image.open(image)
# Process a PIL image for use in a PyTorch model
xform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
return xform(image)
def imshow(image, ax=None, title=None):
if ax is None:
fig, ax = plt.subplots()
# PyTorch tensors assume the color channel is the first dimension
# but matplotlib assumes is the third dimension
image = image.numpy().transpose((1, 2, 0))
# Undo preprocessing
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
image = std * image + mean
# Image needs to be clipped between 0 and 1 or it looks like noise when displayed
image = np.clip(image, 0, 1)
ax.imshow(image)
ax.axis('off')
return ax
imshow(process_image("data/the_little_mermaid/valid/1/scene47521.png"))
def load_from_checkpoint(checkpoint_path):
if use_gpu:
device = torch.device("cuda")
checkpoint = torch.load(checkpoint_path, map_location="cuda:0")
else:
device = torch.device("cpu")
checkpoint = torch.load(checkpoint_, map_location=device)
# reconstruct model and custom classifier from checkpoint
model = models.vgg19(pretrained=True)
# freeze vgg features
# https://stackoverflow.com/questions/51748138/pytorch-how-to-set-requires-grad-false
for param in model.parameters():
param.requires_grad = False
model.classifier = checkpoint['classifier']
model.load_state_dict(checkpoint['model_state_dict'])
model.to(device)
# reconstruct optimizer from checkpoint
optimizer = optim.Adam(model.classifier.parameters(), lr=checkpoint['alpha'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
# load {class : index} dictionary
model.class_to_idx = checkpoint['class_to_idx']
# invert dictionary for {index : class}
# https://therenegadecoder.com/code/how-to-invert-a-dictionary-in-python/
return model, optimizer
def predict(image_path, model, topk=1):
idx_to_class = dict(zip(model.class_to_idx.values(), model.class_to_idx.keys()))
image = process_image(image_path)
image = image.unsqueeze_(0)
# put model into eval mode for predictions
model.eval()
# Implement the code to predict the class from an image file
with torch.no_grad():
if use_gpu:
log_probabilities = model.forward(image.cuda())
else:
log_probabilities = model.forward(image)
probabilities = torch.exp(log_probabilities)
topks = probabilities.topk(topk)
idxs = topks.indices.tolist()[0]
predicted_classes = [idx_to_class[i] for i in idxs]
return topks.values, predicted_classes
cp = os.path.join(film_dir, "checkpoint.pth")
model, optimizer = load_from_checkpoint(cp)
P, classes = predict("data/the_little_mermaid/valid/1/scene47521.png", model, topk=1)
P
classes
def test_sort(film_dir, cutoff=0.5):
checkpoint_path = os.path.join(film_dir, "checkpoint.pth")
test_dir = film_dir + '/test'
if not os.path.isdir(test_dir):
os.mkdir(test_dir)
files = np.asarray(os.listdir(test_dir))
if not os.path.isdir(test_dir + '/1'):
os.mkdir(os.path.join(test_dir + '/1'))
if not os.path.isdir(test_dir + '/0'):
os.mkdir(test_dir + '/0')
model, optimizer = load_from_checkpoint(checkpoint_path)
for file in files:
if file.endswith(".png"):
P, classes = predict(os.path.join(test_dir, file), model, topk=1)
if classes[0] == '1' and P[0] > cutoff:
os.rename(os.path.join(test_dir, file), os.path.join(test_dir + '/1', file))
else:
os.rename(os.path.join(test_dir, file), os.path.join(test_dir + '/0', file))
Data Collection¶
Now the data processing steps are repeated for each film to create the training, validation, and testing sets. Then the training and validation sets are split manually. Finally, a classifier is trained for each film to split the testing set.
films = ["snow_white", "cinderella", "sleeping_beauty", "the_little_mermaid", "beauty_and_the_beast", "aladdin", "pocahontas", "mulan", "the_princess_and_the_frog", "tangled", "brave", "frozen"]
def split_all_film_data():
for film in films:
path = os.path.join('data', film)
train_test_split(path)
#split_all_film_data()
Data¶
d = {'train_pos': [31, 40, 28, 68, 49, 36, 61, 66, 70, 96, 76, 46],
'train_neg': [164, 176, 147, 155, 156, 196, 143, 169, 183, 201, 171, 219],
'valid_pos': [43, 40, 37, 61, 49, 36, 61, 66, 70, 97, 87, 60],
'valid_neg': [156, 186, 187, 135, 184, 230, 151, 145, 210, 168, 177, 261],
'color1': ["#0058a2", "#f8aadc", "#f3ae47", "#d94f45", "#78492a", "#f9d938", "#00bce1", "#28346e", "#8fd18e", "#be84c6", "#f75620", "#fff6db"],
'color2': ["#fff28c", "#7eafd0", "#bf4574", "#009d8c", "#ffda4d", "#45b5a9", "#d88745", "#89b74b", "#dceaa7", "#ecec96", "#025467", "#a9e4f9"]
}
df = pd.DataFrame(data=d, index=films)
df
# Sanity Check
def sanity_check(path, top_k):
P, classes = predict(path, model, topk=top_k)
image = process_image(path)
if use_gpu:
probs = P.cpu().numpy()[0]
else:
probs = P.numpy()[0]
names = [class_dict[i] for i in classes]
colors = [df['color2'][int(i)] for i in classes]
fig, axs = plt.subplots(2, figsize=(4, 8), facecolor=('1'))
axs[0].set_title(names[0], color='0')
axs[1].set_facecolor('1')
imshow(process_image(path),axs[0])
y_pos = np.arange(len(names))
plt.yticks(y_pos, names)
bar_graph = axs[1].barh(y_pos, probs, color=colors)
plt.gca().invert_yaxis()
axs[1].tick_params(labelcolor='0')
sanity_check("data/the_little_mermaid/valid/1/scene47521.png", 2)
def sample_dist_plot(pos, neg, title):
ind = np.arange(len(pos)) # the x locations for the groups
width = 0.6 # the width of the bars: can also be len(x) sequence
p1 = plt.barh(ind, pos, width, color=df['color1'])
p2 = plt.barh(ind, neg, width,
left=pos, color=df['color2'])
plt.xlabel('Positive and negative samples by film')
plt.title(title)
plt.yticks(ind, films)
plt.xticks(np.arange(0, 350, 25))
plt.show()
sample_dist_plot(df['train_pos'], df['train_neg'], "Training set sample distribution")
sample_dist_plot(df['valid_pos'], df['valid_neg'], "Validation set sample distribution")
for film in films:
model.classifier = nn.Sequential(
nn.Linear(input_features, hidden_units),
nn.ReLU(inplace=True),
nn.Dropout(p=0.5, inplace=False),
nn.Linear(hidden_units, hidden_units),
nn.ReLU(inplace=True),
nn.Dropout(p=0.5, inplace=False),
nn.Linear(hidden_units, out_features),
nn.LogSoftmax(dim=1)
)
optimizer = optim.Adam(model.classifier.parameters(), lr=alpha)
train_losses = []
validate_losses = []
# set path to film
film_dir = os.path.join('data', film)
# load the data
dataloaders, image_datasets = load_data(film_dir)
# map labels
class_dict = map_labels(film_dir)
# train the model
train(10)
# save the checkpoint
save(os.path.join(film_dir,'checkpoint.pth'))
#set checkpoint_path
test_sort(film_dir, cutoff=0.6)
Training the Multi-Class Classifier¶
The training set is be comprised of images chosen by the previous binary classification model. False positives are removed and the validation set from the original experiment is used to test the results.
out_features = 12
model.classifier = nn.Sequential(
nn.Linear(input_features, hidden_units),
nn.ReLU(inplace=True),
nn.Dropout(p=0.5, inplace=False),
nn.Linear(hidden_units, hidden_units),
nn.ReLU(inplace=True),
nn.Dropout(p=0.5, inplace=False),
nn.Linear(hidden_units, out_features),
nn.LogSoftmax(dim=1)
)
optimizer = optim.Adam(model.classifier.parameters(), lr=alpha)
train_losses = []
validate_losses = []
# set path to film
film_dir = os.path.join('data', 'multi_class')
# load the data
dataloaders, image_datasets = load_data(film_dir)
# map labels
class_dict = map_labels(film_dir)
# train the model
train(20)
# save the checkpoint
save(os.path.join(film_dir,'checkpoint.pth'))
sanity_check("data/multi_class/valid/0/scene14641.png", 5)
sanity_check("data/multi_class/valid/11/scene75121.png", 5)
sanity_check("data/multi_class/valid/2/scene41641.png", 5)
sanity_check("data/multi_class/valid/5/scene78121.png", 5)
sanity_check("data/multi_class/valid/6/scene83401.png", 5)
sanity_check("data/multi_class/valid/8/scene81481.png", 5)
sanity_check("data/multi_class/valid/8/scene38041.png", 5)