简体   繁体   中英

Compute Optical Flow corresponding to data in the torch.utils.data.DataLoader

I have built a CNN model for action recognition in videos in PyTorch. I'm loading the data for training using the torch dataloader module.

train_loader = torch.utils.data.DataLoader(
            training_data,
            batch_size=8,
            shuffle=True,
            num_workers=4,
            pin_memory=True)

And then passing the train_loader for training the model.

train_epoch(i, train_loader, action_detect_model, criterion, optimizer, opt,
                        train_logger, train_batch_logger)

Now I want to add an additional path which will take the corresponding optical flow of the video frames. To calculate the optical flow I'm using cv2.calcOpticalFlowFarneback . But the problem is that I'm not sure how to get the images corresponding to the data in the train data loader tensor as they will be shuffled. I don't want to pre-compute the optical flow as the storage requirement will be huge (each frame takes 600 kBs).

You have to use an own data loader class to compute optical flow on the fly. The idea is that this class get a list of filename tuples (curr image, next image) containing the current and next frame filenames of the video sequence instead of simple filename list. This allows to get the correct image pairs after suffling the filename list. The following code gives you a very simple example implementaton:

from torch.utils.data import Dataset
import cv2
import numpy as np

class FlowDataLoader(Dataset):
def __init__(self,
             filename_tuples):

    random.shuffle(filename_tuples)
    self.lines = filename_tuples

def __getitem__(self, index):
    img_filenames = self.lines[index]
    curr_img = cv2.cvtColor(cv2.imread(img_filenames[0]), cv2.BGR2GRAY)
    next_img = cv2.cvtColor(cv2.imread(img_filenames[1]), cv2.BGR2GRAY)
    flow = cv2.calcOpticalFlowFarneback(curr_img, next_img, ... [parameter])

    # code for loading the class label
    # label = ...
    #
    # this is a very simple data normalization
    curr_img= curr_img.astype(np.float32) / 255
    next_img = next_img .astype(np.float32) / 255
    # you can return the image and flow seperatly 
    return curr_img, flow, label
    # or stacked as follows
    # return np.dstack((curr_img,flow)), label

# at this place you need a function that create a list of training sample filenames
# that look like this
training_filelist = [(img000.png, img001.png), 
                     (img001.png, img002.png),
                     (img002.png, img003.png)] 

training_data = FlowDataLoader(training_filelist)
train_loader = torch.utils.data.DataLoader(
        training_data,
        batch_size=8,
        shuffle=True,
        num_workers=4,
        pin_memory=True)

This is only a simple example of the FlowDataLoader. Idealy this should be extended so that curr_image output contains normalized rgb values and the optical flow is normalized and clipped too.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM