简体   繁体   English

Pytorch:使用 DataLoader 加载图像样本

[英]Pytorch: Loading sample of images using DataLoader

I use standard DataLoader from torch.utils.data.我使用来自 torch.utils.data 的标准 DataLoader。 I create dataset class and then build DataLoader this way:我创建数据集 class 然后以这种方式构建 DataLoader:

train_dataset = LandmarksDataset(os.path.join(args.data, 'train'), train_transforms, split="train")
train_dataloader = data.DataLoader(train_dataset, batch_size=args.batch_size, num_workers=2,
                                   pin_memory=True, shuffle=True, drop_last=True)

It works perfect, but dataset is big enough - 300k of images.它运行完美,但数据集足够大 - 300k 图像。 So it takes a lot of time for reading images on using DataLoader.因此,使用 DataLoader 读取图像需要花费大量时间。 So it is really wretchedly to build such big DataLoader on debug stage.所以在debug阶段搭建这么大的DataLoader实在是太可惜了。 I just want to test some my hypothesis and want to do it fast!我只是想测试我的一些假设并想快速完成! I don't need to load whole dataset for this.我不需要为此加载整个数据集。

I'm trying to find the way How to load just a small fixed part of dataset without building dataLoader on whole dataset?我正在尝试找到如何在不在整个数据集上构建 dataLoader 的情况下仅加载数据集的一小部分固定部分的方法? At current moment all my ideas are just create another folder, copy some part of images here and use pipeline on it.目前我所有的想法只是创建另一个文件夹,在此处复制部分图像并在其上使用管道。 But I suppose, Pytorch is clever enough to have some builtin methods for loading just a part of images from big dataset.但我想,Pytorch 足够聪明,可以使用一些内置方法从大数据集中加载一部分图像。 Can you give me advice how to?你能给我建议怎么做吗?

As far as I am aware there's no mechanism that does this for you.据我所知,没有任何机制可以为您做到这一点。 Your problem is in the LandmarksDataset class at the point where you're reading the paths of your train data folder.您的问题出在 LandmarksDataset class 中,您正在读取火车数据文件夹的路径。 I assume os.listdir(train_data_folder) .我假设os.listdir(train_data_folder)

Instead you could use a more efficient way os.scandir(train_data_folder) this returns a generator and calling next() on it will give you paths to your images within the train data.相反,您可以使用更有效的方式os.scandir(train_data_folder)这将返回一个生成器并在其上调用next()将为您提供火车数据中图像的路径。 This way you can call next() as many times without changing the structure of your train data folder and build a subset of it.这样,您可以多次调用 next() ,而无需更改训练数据文件夹的结构并构建它的子集。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM