简体   繁体   中英

Data preprocessing for custom dataset in pytorch (transform.Normalize)

I am new to Pytorch and CNN. I am kind of confused about Data Preprocessing. Not sure how to go about transform.Normalising the dataset (in essence how do you calculate mean and std v for your custom dataset ?)

I am loading my data using ImageFolder. The images are of different sizes.

train_transforms = transforms.Compose([transforms.Resize(size=224),
                                       transforms.ToTensor(),  transforms.Normalize((?), (?))
                                       ])
train_dataset = datasets.ImageFolder(root='roota/',
                                     transform=train_transforms)

If you're planning to train your network from scratch, you can calculate your dataset's statistics. The statistics of the dataset are calculated beforehand. You can use the ImageFolder to loop through the images to calculate the dataset statistics. For example, pseudo code -

for inputs, labels in dataloaders:
    # Calculate mean and std dev 
    # save for later processing

Typically, CNNs are pretrained with other larger datasets, such as Imagenet, primarily to reduce the training time. If you are using a pretrained network, you can use the mean and std dev of the original dataset for your training.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM