如何在PyTorch中加载具有多个JSON注释的图像

Question

I would like to know how I can use the data loader in PyTorch for the custom file structure of mine. 我想知道如何在PyTorch中将数据加载器用于我的自定义文件结构。 I have gone through PyTorch documentation, but all those are with separate folders with class. 我已经阅读了PyTorch文档，但是所有这些都带有单独的类文件夹。

My folder structure consists of 2 folders(called training and validation), each with 2 subfolders(called images and json_annotations). 我的文件夹结构包含2个文件夹（称为训练和验证），每个文件夹具有2个子文件夹（称为images和json_annotations）。 Each image in the "images" folder has multiple objects(like cars, cycles, man etc) and each is annotated and have separate JSON files. “ images”文件夹中的每个图像都有多个对象（例如汽车，自行车，人等），并且每个对象都带有注释，并具有单独的JSON文件。 Standard coco annotation is followed. 遵循标准的可可注释。 My intention is to make a neural network which can do real-time classification from videos. 我的意图是制作一个可以从视频进行实时分类的神经网络。

Edit 1: I have done the coding as suggested by Fábio Perez. 编辑1：我完成了FábioPerez建议的编码。

class lDataSet(data.Dataset):
    def __init__(self, path_to_imgs, path_to_json):
        self.path_to_imgs = path_to_imgs
        self.path_to_json = path_to_json
        self.img_ids = os.listdir(path_to_imgs)

    def __getitem__(self, idx):
        img_id = self.img_ids[idx]
        img_id = os.path.splitext(img_id)[0]
        img = cv2.imread(os.path.join(self.path_to_imgs, img_id + ".jpg"))
        load_json = json.load(open(os.path.join(self.path_to_json, img_id + ".json")))
        #n = len(load_json)
        #bboxes = load_json['annotation'][n]['segmentation']
        return img, load_json

    def __len__(self):
        return len(self.image_ids)

When I try this 当我尝试这个

l_data = lDataSet(path_to_imgs = '/home/training/images', path_to_json = '/home/training/json_annotations')

I'm getting l_data with l_data[][0] - images and l_data 我正在使用l_data [] [0]-图像和l_data获取l_data with json. 与json。 Now I'm confused. 现在我很困惑。 How will I use it with finetuning example availalbe in PyTorch? 如何在PyTorch中通过微调示例availalbe使用它？ In that example, dataset and dataloader is done as shown below. 在该示例中，数据集和数据加载器如下所示。
https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html

# Create training and validation datasets
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}
# Create training and validation dataloaders
dataloaders_dict = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size, shuffle=True, num_workers=4) for x in ['train', 'val']}

Answer 1

You should be able to implement your own dataset with data.Dataset . 您应该能够使用data.Dataset实现自己的数据集。 You just need to implement __len__ and __getitem__ methods. 您只需要实现__len__和__getitem__方法。

In your case, you can iterate through all images in the image folder (then you can store the image ids in a list in your Dataset ). 在您的情况下，您可以遍历image文件夹中的所有图像（然后可以将图像id存储在Dataset的列表中）。 Then, you use the index passed to __getitem__ to get the corresponding image id. 然后，您使用传递给__getitem__的索引来获取相应的图像ID。 With this image id, you can read the corresponding JSON file and return the target data that you need. 使用此图像ID，您可以读取相应的JSON文件并返回所需的目标数据。

Something like this: 像这样：

class YourDataLoader(data.Dataset):
    def __init__(self, path_to_imgs, path_to_json):
        self.path_to_imags = path_to_imgs
        self.path_to_json = path_to_json
        self.image_ids = iterate_through_images(path_to_images)

    def __getitem__(self, idx):
        img_id = self.image_ids[idx]
        img = load_image(os.path.join(self.path_to_images, img_id)
        bboxes = load_bboxes(os.path.join(self.path_to_json, img_id)
        return img, bboxes

    def __len__(self):
        return len(self.image_ids)

In iterate_through_images you get all the ids (eg filenames) of images in a directory. 在iterate_through_images您将获得目录中图像的所有ID（例如文件名）。 In load_bboxes you read the JSON and get the information you need. 在load_bboxes您可以阅读JSON并获取所需的信息。

I have a JSON loader implementation here if you want a reference. 如果您需要参考，我这里有一个JSON加载器实现。

如何在PyTorch中加载具有多个JSON注释的图像

问题描述

1 个解决方案

解决方案1
1 2019-03-09 12:41:13

如何在PyTorch中加载具有多个JSON注释的图像

问题描述

1 个解决方案

解决方案1 1 2019-03-09 12:41:13

解决方案1
1 2019-03-09 12:41:13