简体   繁体   中英

pytorch DataLoader not shuffling data or returning random data

I am trying to run a toy example of my data. My end goal is for each batch from the dataloader to have different numbers for each sample that is output, but I am getting the same values, despite calling the random integers call, and shuffling my dataloader data My pytorch dataset is implemented below:

class RandomDataset(Dataset):
    def __init__(self):
        self.array1 = np.random.randint(0,100,20)
        self.array2 = np.random.randint(0,100,20)
        self.array3 = np.random.randint(0,100,20)
        self.array4 = np.random.randint(0,100,20)

    def __len__(self):
    #all arrays are same length
        return len(self.array1)
    def __getitem__(self, idx):
        first = self.array1[idx]
        sample1 = torch.Tensor(self.array1)
        sample2 = torch.Tensor(self.array2)
        sample3 = torch.Tensor(self.array3)
        sample4 = torch.Tensor(self.array4)
        return sample1, sample2, sample3, sample4

And I call the dataloader as

x = RandomDataset()
DL = DataLoader(x, batch_size=3, shuffle= True)

The values are all the same when i run

iterator = iter(DL)
output = next(iterator)
output
>>>[tensor([[21., 80., 46., 58.,  2., 21., 10., 44., 65., 79., 87., 10., 45.,  3.,
           0., 11., 29., 76., 55., 25.],
         [21., 80., 46., 58.,  2., 21., 10., 44., 65., 79., 87., 10., 45.,  3.,
           0., 11., 29., 76., 55., 25.],
         [21., 80., 46., 58.,  2., 21., 10., 44., 65., 79., 87., 10., 45.,  3.,
           0., 11., 29., 76., 55., 25.]]),

I thought each time i get a batch of data it would run my dataset and id get a new array of 20 numbers. What am i missing?

You are always returning self.array1 through self.array4 for every instance of the dataset, what else are you expecting? Where you assuming those tensors would get resampled at every call? No, because they have been initialized in the __init__ . Taking into account the fact you have first = self.array1[idx] I think you meant to index all four tensors in the __getitem__ function.

So here is an example of what I believe to be what you were trying to do:

class RandomDataset(data.Dataset):
    def __init__(self):
        self.array1 = torch.randint(0,100,(20,))
        self.array2 = torch.randint(0,100,(20,))
        self.array3 = torch.randint(0,100,(20,))
        self.array4 = torch.randint(0,100,(20,))

    def __len__(self):
        return len(self.array1)

    def __getitem__(self, idx):
        return self.array1[idx], self.array2[idx], self.array3[idx], self.array4[idx]

Since this use case is rather restricted, you can use TensorDataset :

class RandomDataset(data.TensorDataset):
    def __init__(self):
        r = lambda : torch.randint(0,100,(20,))
        super().__init__(r(), r(), r(), r())        

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM