How to make a custom dataset for CNN using pytorch?

Question

I've started studying deep learning, but every tutos I've seen use pre-made datasets like cifar-10. I'm currently trying to make a programme able to see if something is in an image or not.

For that, I've put two types of images : -with the object in it -without

following the tutos and courses I've had, I managed to do those :

# imports
#...
def getDataFrame():
    filenames = os.listdir("charlies")
    categories = []
    for filename in filenames:
        category = filename.split('_')[0]
        if category == 'yes':
            categories.append(1)
        else:
            categories.append(0)

    df = pd.DataFrame({
        'filename': filenames,
        'category': categories
    })
    ```
It does return what I want :
                          filename  category
0                  no_Resized1.jpg         0
1                         no_2.jpg         0
2                         no_3.jpg         0
3                       yes_16.jpg         1
4                       yes_31.jpg         1
5                       yes_33.jpg         1
6                       yes_34.jpg         1
7                       yes_35.jpg         1

In my CNN file, I try :
```python
if __name__ == "__main__":

    df = dc.getDataFrame()
    ds = customCharlieDataset(df)

    train_df, validate_df = train_test_split(ds, test_size=0.20, random_state=3)
    train_df = train_df.reset_index(drop=True)
    validate_df = validate_df.reset_index(drop=True)

but it return an error " ds = customCharlieDataset(df) TypeError: 'module' object is not callable" witch, first I don't really understand, but I think it as something to do with applying trasforms.toTensor() that I call in my customDataset Class. Here's the code in case it helps:

class customDataset(Dataset):
    def __init__(self, dataframe):
        self.dataframe = dataframe

    def __len__(self):
        return len(self.dataframe)

    def __getitem__(self, index):
        row = self.dataframe.iloc[index]
        transform = transforms.ToTensor()
        return (
            transform(Image.open(row["filename"])),
            row["category"]
        )

Can someone explain why is it not working, or how I should have done it?

Thank you for your help :)!

Answer 1

First (not the error but maybe you should edit that) customCharlieDataset is not defined, do you mean customDataset ? Secondly, the Dataset in class customDataset(Dataset) is torch.utils.data.Dataset right? Third, __getitem__ should return two tensors, one for the input-sample and one for the target. row["category"] doesnt seem like a tensor to me, you should fix that.

You try to call which shouldnt work transform = transforms.ToTensor() . Try just torch.tensor(np.array(Image.open(row["filename"])))

How to make a custom dataset for CNN using pytorch?

Question

1 answers

solution1
0 2020-11-21 09:17:07

How to make a custom dataset for CNN using pytorch?

Question

1 answers

solution1 0 2020-11-21 09:17:07

solution1
0 2020-11-21 09:17:07