I have built a CNN model for action recognition in videos in PyTorch. I'm loading the data for training using the torch dataloader module.
train_loader = torch.utils.data.DataLoader(
training_data,
batch_size=8,
shuffle=True,
num_workers=4,
pin_memory=True)
And then passing the train_loader
for training the model.
train_epoch(i, train_loader, action_detect_model, criterion, optimizer, opt,
train_logger, train_batch_logger)
Now I want to add an additional path which will take the corresponding optical flow of the video frames. To calculate the optical flow I'm using cv2.calcOpticalFlowFarneback
. But the problem is that I'm not sure how to get the images corresponding to the data in the train data loader tensor as they will be shuffled. I don't want to pre-compute the optical flow as the storage requirement will be huge (each frame takes 600 kBs).
You have to use an own data loader class to compute optical flow on the fly. The idea is that this class get a list of filename tuples (curr image, next image) containing the current and next frame filenames of the video sequence instead of simple filename list. This allows to get the correct image pairs after suffling the filename list. The following code gives you a very simple example implementaton:
from torch.utils.data import Dataset
import cv2
import numpy as np
class FlowDataLoader(Dataset):
def __init__(self,
filename_tuples):
random.shuffle(filename_tuples)
self.lines = filename_tuples
def __getitem__(self, index):
img_filenames = self.lines[index]
curr_img = cv2.cvtColor(cv2.imread(img_filenames[0]), cv2.BGR2GRAY)
next_img = cv2.cvtColor(cv2.imread(img_filenames[1]), cv2.BGR2GRAY)
flow = cv2.calcOpticalFlowFarneback(curr_img, next_img, ... [parameter])
# code for loading the class label
# label = ...
#
# this is a very simple data normalization
curr_img= curr_img.astype(np.float32) / 255
next_img = next_img .astype(np.float32) / 255
# you can return the image and flow seperatly
return curr_img, flow, label
# or stacked as follows
# return np.dstack((curr_img,flow)), label
# at this place you need a function that create a list of training sample filenames
# that look like this
training_filelist = [(img000.png, img001.png),
(img001.png, img002.png),
(img002.png, img003.png)]
training_data = FlowDataLoader(training_filelist)
train_loader = torch.utils.data.DataLoader(
training_data,
batch_size=8,
shuffle=True,
num_workers=4,
pin_memory=True)
This is only a simple example of the FlowDataLoader. Idealy this should be extended so that curr_image output contains normalized rgb values and the optical flow is normalized and clipped too.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.