简体   繁体   中英

PyTorch's dataloader “too many open files” error when no files should be open

So this is a minimal code which illustrates the issue:

This is the Dataset:

class IceShipDataset(Dataset):
    BAND1='band_1'
    BAND2='band_2'
    IMAGE='image'

    @staticmethod
    def get_band_img(sample,band):
        pic_size=75
        img=np.array(sample[band])
        img.resize(pic_size,pic_size)
        return img

    def __init__(self,data,transform=None):
        self.data=data
        self.transform=transform

    def __len__(self):
        return len(self.data)  

    def __getitem__(self, idx):

        sample=self.data[idx]
        band1_img=IceShipDataset.get_band_img(sample,self.BAND1)
        band2_img=IceShipDataset.get_band_img(sample,self.BAND2)
        img=np.stack([band1_img,band2_img],2)
        sample[self.IMAGE]=img
        if self.transform is not None:
                sample=self.transform(sample)
        return sample

And this is the code which fails:

PLAY_BATCH_SIZE=4
#load data. There are 1604 examples.
with open('train.json','r') as f:
        data=f.read()
data=json.loads(data)

ds=IceShipDataset(data)
playloader = torch.utils.data.DataLoader(ds, batch_size=PLAY_BATCH_SIZE,
                                          shuffle=False, num_workers=4)
for i,data in enumerate(playloader):
        print(i)

It gives that weird open files error in the for loop… My torch version is 0.3.0.post4

If you want the json file, it is available at Kaggle ( https://www.kaggle.com/c/statoil-iceberg-classifier-challenge )

I should mention that the error has nothing to do with the state of my laptop:

yoni@yoni-Lenovo-Z710:~$ lsof | wc -l
89114
yoni@yoni-Lenovo-Z710:~$ cat /proc/sys/fs/file-max
791958

What am I doing wrong here?

I know how to fix the error, but I don't have a complete explanation for why it happens.

First, the solution : you need to make sure that the image data is stored as numpy.array s, when you call json.loads it loads them as python list s of float s. This causes the torch.utils.data.DataLoader to individually transform each float in the list into a torch.DoubleTensor .

Have a look at default_collate in torch.utils.data.DataLoader - your __getitem__ returns a dict which is a mapping, so default_collate gets called again on each element of the dict . The first couple are int s, but then you get to the image data which is a list , ie a collections.Sequence - this is where things get funky as default_collate is called on each element of the list. This is clearly not what you intended. I don't know what the assumption in torch is about the contents of a list versus a numpy.array , but given the error it would appear that that assumption is being violated.

The fix is pretty trivial, just make sure the two image bands are numpy.array s, for instance in __init__

def __init__(self,data,transform=None):
    self.data=[]
    for d in data:
        d[self.BAND1] = np.asarray(d[self.BAND1])
        d[self.BAND2] = np.asarray(d[self.BAND2])
        self.data.append(d)
    self.transform=transform

or after you load the json, what ever - doesn't really matter where you do it, as long as you do it.


Why does the above results in too many open files ?

I don't know, but as the comments pointed out, it is likely to do with interprocess communication and lock files on the two queues data is taken from and added to.

Footnote: the train.json was not available for download from Kaggle due to the competition still being open (??). I made a dummy json file that should have the same structure and tested the fix on that dummy file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM