简体   繁体   中英

How to import data to torchvision.datasets

I would like to import data whose form is.csv with torchvision.datasets so I can use torch.utils.data.DataLoader to deal with it. The data does not belong to torchvision and it's from my PC. It seems that there is no solution on the google. I will thank a lot if you can give me some advice.

If you already have the csv file you can do this very easily with pandas.

import pandas as pd
my_dataframe = pd.read_csv("path/to/file.csv")

With this you can now acess the data inside your csv file. If you want to use the pytorch torch.utils.data.DataLoader you will also need a torch.utils.data.Dataset .

Depending on the type of Data you are using the Dataset can look very differently. If you are dealing with imagepath and labels inside the csv, have a look at this Dataset I once used for torchvision.models.resnet50() :

from torch.utils.data import Dataset
from PIL import Image
from torchvision import models, transforms
import cv2

class createDataset(Dataset):
    def __init__(self, dataframe):
        self.dataframe = dataframe
        self.transform = transforms.Compose([transforms.ToTensor()])

    def __len__(self):
        return self.dataframe.shape[0]
        
    def __getitem__(self, index):
        image = self.dataframe.iloc[index]["Name_of_imagepath_column"]
        image = cv2.imread(image)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        image = Image.fromarray(image)    
        image = self.transform(image)
        label = self.dataframe.iloc[index]["Name_of_label_column"]
        return {"image": image , "targets": torch.tensor(label, dtype=torch.long)}

The label/targets are optional and were only necessary in my project.

Now you can pass your pandas dataframe to the Dataset class like so:

my_dataset = createDataset(dataframe = my_dataframe)

It is now possible to pass this Dataset to a torch.utils.data.DataLoader and create your Dataloader:

from torch.utils.data import DataLoader

my_dataloader= DataLoader(dataset=my_dataset)

For more options for the Dataloader, like batchsize and shuffle, look up Pytorch DataLoader docs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM