如何使用OpenCV在视频中显示CNN-LSTM预测输出？

Question

everyone, 大家，

I have a CNN-LSTM model trained in keras. 我有一个在keras中训练的CNN-LSTM模型。 As input, i loaded sets of 15 frames per video with 30x30 and with just one channel (15, 30, 30, 1). 作为输入，我以30x30和仅一个通道（15、30、30、1）加载了每个视频15帧的集。

I extracted them from a total of 279 videos, and stored them in a big tensor with dimensions (279, 15, 30, 30, 1). 我从总共279个视频中提取了它们，并将它们存储在一个尺寸为（279、15、30、30、1）的大张量中。

X_data.shape = (279, 15, 30, 30, 1)
y_data.shape = (279,)

I'm working with two classes of videos (so targets are 0 and 1). 我正在处理两类视频（因此目标是0和1）。

The input layer of my time distributed CNN (before my LSTM layer) is: 我的时间分布式CNN的输入层（在我的LSTM层之前）是：

input_layer = Input(shape=(None, 30, 30, 1))

Ok, they feeded in my network and everything worked well, but now i need to predict these videos and i want to display the output in the video i'm classifying. 好的，他们输入了我的网络，一切正常，但是现在我需要预测这些视频，并且希望在我正在分类的视频中显示输出。

I wrote this to read the video and display the text: 我写这篇文章是为了阅读视频并显示文本：

vid = cv2.VideoCapture(video_path)

while(vid.isOpened()):
    ret, frame = vid.read()
    if ret == True:
        texto = predict_video(frame)
        frame = cv2.resize(frame,(750,500),interpolation=cv2.INTER_AREA)
        frame = cv2.putText(frame,str(texto),(0,130), cv2.FONT_HERSHEY_SIMPLEX, 2.5, (255, 0, 0), 2, cv2.LINE_AA)
        cv2.imshow('Video', frame)

        if cv2.waitKey(25) & 0xFF == ord('q'):
            break
    else:
        break

vid.release()
cv2.destroyAllWindows()

The predict_video() is used to generate the predicted output as a text, as you can see: 如您所见，predict_video（）用于将预测输出生成为文本。

def predict_video(frame):
    count_frames = 0
    frame_list = []

    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    frame = cv2.resize(frame,(30,30),interpolation=cv2.INTER_AREA)

    while count_frames < 15:
        frame_list.append(frame)
    count_frames = 0

    frame_set = np.array(frame_list)
    frame_set = frame_set.reshape(1, 15, 30, 30, 1)

    pred = model.predict(frame_set)
    pred_ = np.argmax(pred,axis=1) #i'm using the Model object from Keras

    if pred_ == 1:
        return 'Archery'
    elif pred_ == 0:
        return 'Basketball'

Due to the fact that the input dimension of the CNN-LSTM is equal to (None, 30, 30, 1) i need to predict with model.predict(sample) a sample with dimensions like this (1, 15, 30, 30, 1). 由于CNN-LSTM的输入维度等于（None， 30，30，1 ），因此我需要使用model.predict（sample）来预测具有这样的维度（1，15，30，30 ，1）。 How i can predict a video in real time, once i want to predict not frame by frame but with a model based on sets of 15 frames? 一旦我不想以帧为单位而是以基于15帧的模型为基础进行预测，我如何实时预测视频？

The actual predict_video() function "freeze" my computer. 实际的predict_video（）函数会“冻结”我的计算机。

Thanks for the attention! 感谢您的关注！

Answer 1

Here is piece of code you can use to put text on each frame 这是一段代码，您可以用来在每帧上放置文本

cv2.putText(img, text, (textX, textY ), font, 1, (255, 255, 255), 2)

Here "img" is your frame, "text" is your output prediction while "textX and textY" are your coordinates around which you want to center the text. 在这里，“ img”是您的框架，“文本”是您的输出预测，而“ textX和textY”是您要以文本为中心的坐标。 While answer to your other part where you to make prediction on set of 15 frame rather than on a single frame. 回答另一部分时，您将以15帧为一组而不是单个帧进行预测。 well what you can do is train a model in keras by setting batch size to 15 images and with true label present for each frame.. After you complete training model will expect you input batch of 15 frames. 好，您可以做的是通过将批处理大小设置为15张图像并在每帧中显示真实标签，在keras中训练模型。完成训练模型后，您将希望输入15帧的批处理。 what you can do later in while loop is set a check that when frames passed are equal to 15 you collect those frames create a tensor of dimension (15,30,30,1) 稍后在while循环中设置的内容是检查通过的帧等于15时，您会收集这些帧以创建尺寸为(15,30,30,1)的张量

This part of code 这部分代码

frame_list = []
while count_frames < 15:
    frame_list.append(frame)

should not be inside function because for every frame when function is called frame_list is set to zero since its scope is limited to function. 不应在函数内部，因为调用函数时将每个帧的frame_list设置为零，因为其范围仅限于函数。 You should write this code inside for loop and when number of frames in frame_list is equal to 15 then you should call the model.predict(batch) function and also extend the dimension first to set dimensions to (1,15,30,30,1) 您应该在for循环中编写此代码，并且当frame_list的帧数等于15时，您应该调用model.predict(batch)函数，并首先扩展尺寸以将尺寸设置为(1,15,30,30,1)

如何使用OpenCV在视频中显示CNN-LSTM预测输出？

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-07-29 04:51:07

如何使用OpenCV在视频中显示CNN-LSTM预测输出？

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-07-29 04:51:07

解决方案1
2 已采纳 2018-07-29 04:51:07