Pytorch - Purpose of images preprocessing in the transfer learning tutorial

Question

In the Pytorch transfer learning tutorial , the images in both the training and the test sets are being pre-processed using the following code:

data_transforms = {
'train': transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'val': transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}

My question is - what is the intuition behind this choice of transforms? In particular, what is the intuition behind choosing RandomResizedCrop(224) and RandomHorizontalFlip() ? Wouldn't it be better to just let the neural network train on the entire image? (or at least, augment the dataset using these transformation)? I understand why it is reasonable to insert only the portion of the image that contains the ant/bees to the neural network but can't understand why it is reasonable to insert a random crop...

Hope I managed to make all my questions clear

Thanks!

Answer 1

Regarding `RandomResizedCrop`

Why ...ResizedCrop ? - This answer is straightforward. Resizing crops to the same dimensions allows you to batch your input data. Since the training images in your toy dataset have different dimensions, this is the best way to make your training more efficient.
Why Random... ? - Generating different random crops per image every iteration (ie random center and random cropping dimensions/ratio before resizing) is a nice way to artificially augment your dataset, ie feeding your network different-looking inputs (extracted from the same original images) every iteration. This helps to partially avoid over-fitting for small datasets, and makes your network overall more robust.
You are however right that, since some of your training images are up to 500px wide and the semantic targets ( ant / bee ) sometimes cover only a small portion of the images, there is a chance that some of these random crops won't contain an insect... But as long as the chances this happens stay relatively low, it won't really impact your training. The advantage of feeding different training crops every iteration (instead of always the same non-augmented images) vastly counterbalances the side-effect of sometimes giving "empty" crops. You could verify this assertion by replacing RandomResizedCrop(224) by Resize(224) (fixed resizing) in your code and compare the final accuracies on the test set.
Furthermore, I would add that neural networks are smart cookies, and sometimes learn to recognize images through features you wouldn't expect (ie they tend to learn recognition shortcuts if your dataset or losses are biased, cf over-fitting). I wouldn't be surprised if this toy network is performing so well despite being trained sometimes on "empty" crops just because it learns eg to distinguish between usual "ant backgrounds" (ground floor, leaves, etc.) and "bee backgrounds" (flowers).

Regarding `RandomHorizontalFlip`

Its purpose is also to artificially augment your dataset. For the network, an image and its flipped version are two different inputs, so you are basically artificially doubling the size of your training dataset for "free".

There are plenty more operations one can use to augment training datasets (eg RandomAffine , ColorJitter , etc). One has however to be careful to choose transformations which are meaningful for the target use-case / which are not impacting the target semantic information (eg for ant/bee classification, RandomHorizontalFlip is fine as you will probably get as many images of insects facing right than facing left; however RandomVerticalFlip doesn't make much sense as you won't get pictures of insects upside-down most certainly).

Pytorch - Purpose of images preprocessing in the transfer learning tutorial

Question

1 answers

solution1
2 ACCPTED 2018-06-21 10:07:16

Regarding `RandomResizedCrop`

Regarding `RandomHorizontalFlip`

Pytorch - Purpose of images preprocessing in the transfer learning tutorial

Question

1 answers

solution1 2 ACCPTED 2018-06-21 10:07:16

Regarding RandomResizedCrop

Regarding RandomHorizontalFlip

solution1
2 ACCPTED 2018-06-21 10:07:16

Regarding `RandomResizedCrop`

Regarding `RandomHorizontalFlip`