Keras 自定义视频数据生成器；如何将正确的输出传递给我的 model？

Question

I am creating a RNN model to process videos of a certain length (10 frames).我正在创建一个 RNN model 来处理一定长度（10 帧）的视频。 Each video is stored as multiple images (of varying lengths) within their individual folders.每个视频都作为多个图像（不同长度）存储在各自的文件夹中。 Before passing the batch of frames to the RNN model however, I am pre-processing the images of each frame using a ResNet feature extractor.然而，在将这批帧传递给 RNN model 之前，我正在使用 ResNet 特征提取器预处理每帧的图像。 I am using a custom data generator to take the paths of the folders with the images, pre-process the images and then pass it to the model.我正在使用自定义数据生成器获取带有图像的文件夹路径，预处理图像，然后将其传递给 model。

I have rather clunkily been doing this without a data generator but this is not really practical as I have a training set of >10,000 videos and also later wish to perform data augmentation.我在没有数据生成器的情况下一直很笨拙地这样做，但这并不实用，因为我有一个超过 10,000 个视频的训练集，并且稍后希望执行数据扩充。

This is the code of my custom data generator这是我的自定义数据生成器的代码

class DataGenerator(keras.utils.Sequence):
    'Generates data for Keras'
    def __init__(self, list_IDs, labels, video_paths,
                 batch_size=32, video_length=10, dim=(224,224),
                 n_channels=3, n_classes=4, IMG_SIZE = 224, MAX_SEQ_LENGTH = 10,
                 NUM_FEATURES = 2048, shuffle=True):
        'Initialization'
        
        self.list_IDs = list_IDs
        self.labels = labels
        self.video_paths = video_paths        
        self.batch_size = batch_size
        self.dim = dim
        self.video_length = video_length
        self.n_channels = n_channels
        self.n_classes = n_classes
        self.IMG_SIZE = IMG_SIZE
        self.MAX_SEQ_LENGTH = MAX_SEQ_LENGTH
        self.NUM_FEATURES = NUM_FEATURES
        self.shuffle = shuffle
        self.on_epoch_end()
    
    def crop_center_square(frame):
        y, x = frame.shape[0:2]
        min_dim = min(y, x)
        start_x = (x // 2) - (min_dim // 2)
        start_y = (y // 2) - (min_dim // 2)
        return frame[start_y : start_y + min_dim, start_x : start_x + min_dim]
    
    def load_series(self, videopath):
        frames = []
        image_paths = [os.path.join(videopath, o) for o in os.listdir(videopath)]
        frame_num = np.linspace(0,len(image_paths)-1, num=10)   
        frame_num = frame_num.astype(int)
        resize=(self.IMG_SIZE, self.IMG_SIZE)
        # resize=(IMG_SIZE, IMG_SIZE)
        
        for ix in frame_num:
            image = Image.open(image_paths[ix])
            im_array = np.asarray(image)
            im_array = self.crop_center_square(im_array)
            # im_array = crop_center_square(im_array)
            im_array = cv2.resize(im_array, resize)
            stacked_im_array = np.stack((im_array,)*3, axis=-1)
            frames.append(stacked_im_array)
            # plt.imshow(stacked_im_array)
            # plt.show()
            
        return np.array(frames)
    
    def build_feature_extractor(self):
        feature_extractor = keras.applications.resnet_v2.ResNet152V2(
            weights="imagenet",
            include_top=False,
            pooling="avg",
            input_shape=(self.IMG_SIZE, self.IMG_SIZE, 3),
        )
        preprocess_input = keras.applications.resnet_v2.preprocess_input

        inputs = keras.Input((self.IMG_SIZE, self.IMG_SIZE, 3))
        preprocessed = preprocess_input(inputs)

        outputs = feature_extractor(preprocessed)
        return keras.Model(inputs, outputs, name="feature_extractor")


    def __getitem__(self, index):
        'Generate one batch of data'
        # Generate indexes of the batch
        indexes = self.indexes[index*self.batch_size: (index+1)*self.batch_size]
        
        # Find list of IDs
        list_IDs_temp = [self.list_IDs[k] for k in indexes]
        
        # Generate data
        [frame_features, frame_masks], frame_labels = self._generate_X(list_IDs_temp)
      
        return [frame_features, frame_masks], frame_labels
    
    def _generate_X(self, list_IDs_temp):
        'Generates data containing batch_size videos'
        # Initialization
        frame_masks = np.zeros(shape=(self.batch_size, self.MAX_SEQ_LENGTH), dtype="bool")
        frame_features = np.zeros(shape=(self.batch_size, self.MAX_SEQ_LENGTH, self.NUM_FEATURES), dtype="float32")
        frame_labels = np.zeros(shape=(self.batch_size), dtype="int")
        feature_extractor = self.build_feature_extractor()
        tt = time.time()
        # frame_masks = np.zeros(shape=(batch_size, MAX_SEQ_LENGTH), dtype="bool")
        # frame_features = np.zeros(shape=(batch_size, MAX_SEQ_LENGTH, NUM_FEATURES), dtype="float32")
        # frame_labels = np.zeros(shape=(batch_size), dtype="int")
        
        for idx, ID in enumerate(list_IDs_temp):
            videopath = self.video_paths[ID]
            # videopath = video_paths[ID]
            video_frame_label = self.labels[ID]
            # Gather all its frames and add a batch dimension.       
            frames = self.load_series(Path(videopath))
            # frames = load_series(Path(videopath))
            
            # At this point frames.shape = (10, 224, 224, 3)
            frames = frames[None, ...]
            # After this, frames.shape = (1, 10, 224, 224, 3)

            # Initialize placeholders to store the masks and features of the current video.
            temp_frame_mask = np.zeros(shape=(1, self.MAX_SEQ_LENGTH,), dtype="bool")
            # temp_frame_mask = np.zeros(shape=(1, MAX_SEQ_LENGTH,), dtype="bool")
            # temp_frame_mask.shape = (1,60)
            
            temp_frame_features = np.zeros(shape=(1, self.MAX_SEQ_LENGTH, self.NUM_FEATURES), dtype="float32")
            # temp_frame_features = np.zeros(shape=(1, MAX_SEQ_LENGTH, NUM_FEATURES), dtype="float32")
            # temp_frame_features.shape = (1, 60, 2048)
            
            # Extract features from the frames of the current video.
            for i, batch in enumerate(frames):
                video_length = batch.shape[0]
                length = min(self.MAX_SEQ_LENGTH, video_length)
                # length = min(MAX_SEQ_LENGTH, video_length)
                for j in range(length):
                    temp_frame_features[i, j, :] = feature_extractor.predict(batch[None, j, :])
                    # temp_frame_features[i, j, :] = feature_extractor.predict(batch[None, j, :])
                temp_frame_mask[i, :length] = 1  # 1 = not masked, 0 = masked
                
            frame_features[idx,] = temp_frame_features.squeeze()
            frame_masks[idx,] = temp_frame_mask.squeeze()
            frame_labels[idx] = video_frame_label
        tf = time.time() - tt
        print(f'Pre-process length: {tf}')
        
        return [frame_features, frame_masks], frame_labels

    def on_epoch_end(self):
        'Updates indexes after each epoch'
        self.indexes = np.arange(len(self.list_IDs))
        if self.shuffle == True:
            np.random.shuffle(self.indexes)

This is the code for the RNN model这是 RNN model 的代码

label_processor = keras.layers.StringLookup(num_oov_indices=0, vocabulary=np.unique(train_df["view"]))

print(label_processor.get_vocabulary())

train_list_IDs = train_df.index
train_labels = train_df["view"].values
train_labels = label_processor(train_labels[..., None]).numpy()
train_video_paths = train_df['series']

training_generator = DataGenerator(train_list_IDs, train_labels, train_video_paths)

test_list_IDs = test_df.index
test_labels = test_df["view"].values
test_labels = label_processor(test_labels[..., None]).numpy()
test_video_paths = test_df['series']

testing_generator = DataGenerator(test_list_IDs, test_labels, test_video_paths)

# Utility for our sequence model.
def get_sequence_model():
    class_vocab = label_processor.get_vocabulary()

    frame_features_input = keras.Input((MAX_SEQ_LENGTH, NUM_FEATURES))
    mask_input = keras.Input((MAX_SEQ_LENGTH,), dtype="bool")

    # Refer to the following tutorial to understand the significance of using `mask`:
    # https://keras.io/api/layers/recurrent_layers/gru/
    x = keras.layers.GRU(16, return_sequences=True)(
        frame_features_input, mask=mask_input
    )
    x = keras.layers.GRU(8)(x)
    x = keras.layers.Dropout(0.4)(x)
    x = keras.layers.Dense(8, activation="relu")(x)
    output = keras.layers.Dense(len(class_vocab), activation="softmax")(x)
    
    rnn_model = keras.Model([frame_features_input, mask_input], output)

    rnn_model.compile(
        loss="sparse_categorical_crossentropy", optimizer="adam", metrics=["accuracy"]
    )
    return rnn_model


# Utility for running experiments.
def run_experiment():
    now = datetime.now()
    current_time = now.strftime("%d_%m_%Y_%H_%M_%S")
    filepath = os.path.join(Path('F:/RNN'), f'RNN_ResNet_Model_{current_time}')
    checkpoint = keras.callbacks.ModelCheckpoint(
        filepath, save_weights_only=True, save_best_only=True, verbose=1
    )

    seq_model = get_sequence_model()
    history = seq_model.fit(training_generator,
        epochs=EPOCHS,
        callbacks=[checkpoint],
    )
    seq_model.load_weights(filepath)
    _, accuracy = seq_model.evaluate(testing_generator)
    print(f"Test accuracy: {round(accuracy * 100, 2)}%")

    return history, accuracy, seq_model


_, accuracy, sequence_model = run_experiment()

I am struggling to figure out how I can pass the results of my custom data generator to my RNN model?我正在努力弄清楚如何将自定义数据生成器的结果传递给我的 RNN model？ How can I best rewrite my code to either work with model.fit() or model.fit_generator()?我怎样才能最好地重写我的代码以使用 model.fit() 或 model.fit_generator()？

Thank you in advance!先感谢您！

Answer 1

Please specify in your questions what exactly it is you're struggling with.请在您的问题中具体说明您正在努力解决的问题。 Do you expect different results, is your code slow, or do you get errors?您是否期望不同的结果，您的代码是否运行缓慢，或者是否出现错误？ Based on your code I see some issues and would suggest the following adjustments:根据您的代码，我发现了一些问题，并建议进行以下调整：

The __getitem__() function in a DataGenerator is called every time you retrieve a batch of data from your generator.每次从生成器中检索一批数据时，DataGenerator 中的__getitem__() function 都会被调用。 Within that function you call _generate_X() which also initializes, again - at every batch generation, the pretrained ResNet feature extractor through feature_extractor = self.build_feature_extractor() .在该 function 中，您调用_generate_X() ，它也再次初始化 - 在每次批次生成时，通过feature_extractor = self.build_feature_extractor()预训练的 ResNet 特征提取器。 This is highly inefficient.这是非常低效的。

As an alternative, I would propose to remove the model creation within your generator class and to rather create the feature extractor in your main notebook and give it as parameter for your DataGenerator instance:作为替代方案，我建议删除生成器 class 中的 model 创建，而是在主笔记本中创建特征提取器并将其作为 DataGenerator 实例的参数：

In your main file:在你的主文件中：

def build_feature_extractor(self): [...]

feature_extractor = build_feature_extractor()

testing_generator = DataGenerator(test_list_IDs, test_labels, test_video_paths, feature_extractor)

For the generator class:对于发电机 class：

class DataGenerator(keras.utils.Sequence):
'Generates data for Keras'
def __init__(self, list_IDs, labels, video_paths, feature_extractor,
             batch_size=32, video_length=10, dim=(224,224),
             n_channels=3, n_classes=4, IMG_SIZE = 224, MAX_SEQ_LENGTH = 10,
             NUM_FEATURES = 2048, shuffle=True):
    'Initialization'
    
    self.list_IDs = list_IDs
    [...]
    self.feature_extractor = feature_extractor [...]

and then adjust to this:然后对此进行调整：

temp_frame_features[i, j, :] = self.feature_extractor.predict(batch[None, j, :])

You have correctly used the generator in your .fit call , using model.fit(training_generator, ...) will feed your model the created batches from __getitem__() .您已在.fit call中正确使用生成器，使用model.fit(training_generator, ...)将为您的 model 提供来自__getitem__()创建的批次。

Answer 2

The error I was getting was getting was我得到的错误是

raise NotImplementedError keras

Rather stupidly, I had forgotten to put the following function within the DataGenerator function相当愚蠢，我忘记将以下 function 放入 DataGenerator function 中

def __len__(self):
        'Denotes the number of batches per epoch'
        return int(np.floor(len(self.list_IDs) / self.batch_size))

The error went away after that.之后错误消失了。

obsolete_hegemony did give me an excellent suggestion to optimise my code and separate the feature extraction pre-processing! obsolete_hegemony 确实给了我一个很好的建议来优化我的代码并分离特征提取预处理！

Keras 自定义视频数据生成器；如何将正确的输出传递给我的 model？

问题描述

2 个解决方案

解决方案1
0 已采纳 2022-04-19 20:05:36

解决方案2
0 2022-04-20 12:39:35

Keras 自定义视频数据生成器； 如何将正确的输出传递给我的 model？

问题描述

2 个解决方案

解决方案1 0 已采纳 2022-04-19 20:05:36

解决方案2 0 2022-04-20 12:39:35

Keras 自定义视频数据生成器；如何将正确的输出传递给我的 model？

解决方案1
0 已采纳 2022-04-19 20:05:36

解决方案2
0 2022-04-20 12:39:35