You may have heard that deep learning, particularly within image recognition situations, requires huge amounts of data. Mathematically, a ‘small’ ResNet-18 model has millions of trainable parameters, and as everyone knows from statistics class - we need more training data than model parameters, right?* While it remains true that, all else equal, the more data a model has access to in training the better it will perform - there are a few tricks we can use to increase the mileage of the data which we do have. As with life and deep learning, a greater exposure to different varieties and experiences helps us become better connoisseurs of those things. However, acquiring a large quantity (experiences) of labeled data with a sufficient diversity (varieties) is difficult, time consuming, and can even be expensive. If only we could generate more data from, well, the data we already had - our lives would be much easier.
Data Augmentation is the subtle altering of images (or more generally - data) prior to being fed to our model. This relatively simple procedure’s objective is to manipulate the data being ingested by the model in such a way that the model will generalize better - ultimately requiring less data for similar levels of performance (another way of saying better performance with the same amount of data).
Let’s see some data augmentation in action within a Tex-Mex food classification example using the super fun Fast.ai library.
from fastai.vision.all import *
Let’s imagine ourselves as the cheif developer for Silicon Valley’s hottest new food startup Next-Mex, which we pitch to investors as ‘kind of like Yelp - except only for mexican food’. Our goal here is to be able to classify photos uploaded to the app/site as a particular type of Mexican food. Eventually we hope to identify trends within American’s collective tex-mex palate and sell that data to Taco Bell for millions.
Well, the best thing to do here would be to get real images uploaded by the users of our site and classify them ‘by hand’ to create a training set for our Deep Learning model. Except we haven’t launched yet (sorry pg)…
Instead we grab some examples of several classic Tex-Mex foods from the Bing Image Search API. We have 150 examples of chimichangas, burritos, tacos, fajitas, and quesadillas respectively.
We can see 10 random examples of the data below.
path = Path('mexican_foods')
foods = DataBlock( blocks=(ImageBlock, CategoryBlock), get_items=get_image_files, splitter=RandomSplitter(valid_pct=0.2, seed=42), get_y=parent_label, item_tfms=Resize(128)) dls = foods.dataloaders(path)
dls = foods.dataloaders(path)