PyTorch for Object detection - Image augmentation

Question

I am using PyTorch for object detection and refining an existing model (transfer learning) as described in the following link - https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html

While different transformations are used for image augmentation (horizontal flip in this tutorial), the tutorial doesnt mention anything on transforming the bounding box/annotation to ensure they are in line with the transformed image. Am I missing something basic here?

Answer 1

In the training phase, the transforms are indeed applied on both images and targets, while loading the data. In the PennFudanDataset class, we have these two lines:

if self.transforms is not None:  
    img, target = self.transforms(img, target)

where target is a dictionary containing:

target = {}
target["boxes"] = boxes
target["labels"] = labels
target["masks"] = masks
target["image_id"] = image_id
target["area"] = area
target["iscrowd"] = iscrowd

self.transforms() in PennFudanDataset class is set to a list of transforms comprising [transforms.ToTensor(), transforms.Compose()] , the return value from get_transform() while instantiating the dataset with:

dataset = PennFudanDataset('PennFudanPed', get_transform(train=True))

The transforms transforms.Compose() comes from T , a custom transform written for object detection task. Specifically, in the __call__ of RandomHorizontalFlip() , we process both the image and target (eg, mask, keypoints):

For the sake of completeness, I borrow the code from the github repo:

def __call__(self, image, target):
        if random.random() < self.prob:
            height, width = image.shape[-2:]
            image = image.flip(-1)
            bbox = target["boxes"]
            bbox[:, [0, 2]] = width - bbox[:, [2, 0]]
            target["boxes"] = bbox
            if "masks" in target:
                target["masks"] = target["masks"].flip(-1)
            if "keypoints" in target:
                keypoints = target["keypoints"]
                keypoints = _flip_coco_person_keypoints(keypoints, width)
                target["keypoints"] = keypoints
        return image, target

Here, we can understand how they perform the flipping on the masks and keypoints in accordance with the image.

PyTorch for Object detection - Image augmentation

Question

1 answers

solution1
0 2021-03-08 07:13:41

PyTorch for Object detection - Image augmentation

Question

1 answers

solution1 0 2021-03-08 07:13:41

solution1
0 2021-03-08 07:13:41