OpenCV：VideoCapture的阅读框将视频推向奇怪的错误位置

Question

(I will put a 500 reputation bounty on this question as soon as it's eligible - unless the question got closed.) （一旦符合条件，我将在这个问题上立即获得500点声望奖励 - 除非问题已经结束。）

Problem in one sentence 一句话的问题

Reading frames from a VideoCapture advances the video much further than it's supposed to. 从VideoCapture读取帧会使视频比预期的更进一步。

Explanation 说明

I need to read and analyze frames from a 100 fps (according to cv2 and VLC media player) video between certain time-intervals. 我需要在特定时间间隔内从100 fps（根据cv2和VLC媒体播放器）视频读取和分析帧。 In the minimal example that follows I am trying to read all the frames for the first ten seconds of a three minute video. 在下面的最小示例中，我试图读取三分钟视频的前十秒的所有帧。

I am creating a cv2.VideoCapture object from which I read frames until the desired position in milliseconds is reached. 我正在创建一个cv2.VideoCapture对象，我从中读取帧，直到达到所需的位置（以毫秒为单位）。 In my actual code each frame is analyzed, but that fact is irrelevant in order to showcase the error. 在我的实际代码中，每个框架都进行了分析，但这一事实与展示错误无关。

Checking the current frame and millisecond position of the VideoCapture after reading the frames yields correct values, so the VideoCapture thinks it is at the right position - but it is not. 在读取帧后检查VideoCapture的当前帧和毫秒位置会产生正确的值，因此VideoCapture 认为它位于正确的位置 - 但事实并非如此。 Saving an image of the last read frame reveals that my iteration is grossly overshooting the destination time by over two minutes . 保存最后一个读取帧的图像显示我的迭代超过目标时间超过两分钟 。

What's even more bizarre is that if I manually set the millisecond position of the capture with VideoCapture.set to 10 seconds (the same value VideoCapture.get returns after reading the frames) and save an image, the video is at (almost) the right position! 更奇怪的是，如果我使用VideoCapture.set手动设置捕获的毫秒位置为10秒（读取帧后VideoCapture.get返回相同值）并保存图像，则视频（几乎）正确位置！

Demo video file 演示视频文件

In case you want to run the MCVE, you need the demo.avi video file. 如果您想运行MCVE，则需要demo.avi视频文件。 You can download it HERE . 您可以下载它这里。

MCVE MCVE

This MCVE is carefully crafted and commented. 这款MCVE经过精心设计和评论。 Please leave a comment under the question if anything remains unclear. 如果有任何不清楚的地方，请在问题下留言。

If you are using OpenCV 3 you have to replace all instances of cv2.cv.CV_ with cv2. 如果您使用的是OpenCV 3，则必须使用cv2.替换cv2.cv.CV_所有实例cv2. . 。 (The problem occurs in both versions for me.) （对我来说，这两个版本都会出现问题。）

import cv2

# set up capture and print properties
print 'cv2 version = {}'.format(cv2.__version__)
cap = cv2.VideoCapture('demo.avi')
fps = cap.get(cv2.cv.CV_CAP_PROP_FPS)
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('initial attributes: fps = {}, pos_msec = {}, pos_frames = {}'
      .format(fps, pos_msec, pos_frames))

# get first frame and save as picture
_, frame = cap.read()
cv2.imwrite('first_frame.png', frame)

# advance 10 seconds, that's 100*10 = 1000 frames at 100 fps
for _ in range(1000):
    _, frame = cap.read()
    # in the actual code, the frame is now analyzed

# save a picture of the current frame
cv2.imwrite('after_iteration.png', frame)

# print properties after iteration
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('attributes after iteration: pos_msec = {}, pos_frames = {}'
      .format(pos_msec, pos_frames))

# assert that the capture (thinks it) is where it is supposed to be
# (assertions succeed)
assert pos_frames == 1000 + 1 # (+1: iteration started with second frame)
assert pos_msec == 10000 + 10

# manually set the capture to msec position 10010
# note that this should change absolutely nothing in theory
cap.set(cv2.cv.CV_CAP_PROP_POS_MSEC, 10010)

# print properties  again to be extra sure
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('attributes after setting msec pos manually: pos_msec = {}, pos_frames = {}'
      .format(pos_msec, pos_frames))

# save a picture of the next frame, should show the same clock as
# previously taken image - but does not
_, frame = cap.read()
cv2.imwrite('after_setting.png', frame)

MCVE output MCVE输出

The print statements produce the following output. print语句产生以下输出。

cv2 version = 2.4.9.1 cv2版本= 2.4.9.1
initial attributes: fps = 100.0, pos_msec = 0.0, pos_frames = 0.0 初始属性：fps = 100.0，pos_msec = 0.0，pos_frames = 0.0
attributes after reading: pos_msec = 10010.0, pos_frames = 1001.0 读取后的属性：pos_msec = 10010.0，pos_frames = 1001.0
attributes after setting msec pos manually: pos_msec = 10010.0, pos_frames = 1001.0 手动设置msec pos后的属性：pos_msec = 10010.0，pos_frames = 1001.0

As you can see, all properties have the expected values. 如您所见，所有属性都具有预期值。

imwrite saves the following pictures. imwrite保存以下图片。

first_frame.png first_frame.png

after_iteration.png after_iteration.png

after_setting.png after_setting.png

You can see the problem in the second picture. 您可以在第二张图片中看到问题。 The target of 9:26:15 (real time clock in picture) is missed by over two minutes. 9:26:15（图中实时时钟）的目标错过了超过两分钟。 Setting the target time manually (third picture) sets the video to (almost) the correct position. 手动设置目标时间（第三张图片）将视频设置为（几乎）正确的位置。

What am I doing wrong and how do I fix it? 我做错了什么，我该如何解决？

Tried so far 到目前为止尝试过

cv2 2.4.9.1 @ Ubuntu 16.04 cv2 2.4.9.1 @ Ubuntu 16.04
cv2 2.4.13 @ Scientific Linux 7.3 (three computers) cv2 2.4.13 @ Scientific Linux 7.3（三台电脑）
cv2 3.1.0 @ Scientific Linux 7.3 (three computers) cv2 3.1.0 @ Scientific Linux 7.3（三台电脑）

Creating the capture with 使用创建捕获

cap = cv2.VideoCapture('demo.avi', apiPreference=cv2.CAP_FFMPEG)

and 和

cap = cv2.VideoCapture('demo.avi', apiPreference=cv2.CAP_GSTREAMER)

in OpenCV 3 (version 2 does not seem to have the apiPreference argument). 在OpenCV 3中（版本2似乎没有apiPreference参数）。 Using cv2.CAP_GSTREAMER takes extremely long (about 2-3 minutes to run the MCVE) but both api-preferences produce the same incorrect images. 使用cv2.CAP_GSTREAMER需要非常长的时间（运行MCVE大约需要2-3分钟），但两个api-preferences都会产生相同的错误图像。

When using ffmpeg directly to read frames (credit to this tutorial) the correct output images are produced. 当直接使用ffmpeg读取帧时（ ffmpeg 本教程），会生成正确的输出图像。

import numpy as np
import subprocess as sp
import pylab

# video properties
path = './demo.avi'
resolution = (593, 792)
framesize = resolution[0]*resolution[1]*3

# set up pipe
FFMPEG_BIN = "ffmpeg"
command = [FFMPEG_BIN,
           '-i', path,
           '-f', 'image2pipe',
           '-pix_fmt', 'rgb24',
           '-vcodec', 'rawvideo', '-']
pipe = sp.Popen(command, stdout = sp.PIPE, bufsize=10**8)

# read first frame and save as image
raw_image = pipe.stdout.read(framesize)
image = np.fromstring(raw_image, dtype='uint8')
image = image.reshape(resolution[0], resolution[1], 3)
pylab.imshow(image)
pylab.savefig('first_frame_ffmpeg_only.png')
pipe.stdout.flush()

# forward 1000 frames
for _ in range(1000):
    raw_image = pipe.stdout.read(framesize)
    pipe.stdout.flush()

# save frame 1001
image = np.fromstring(raw_image, dtype='uint8')
image = image.reshape(resolution[0], resolution[1], 3)
pylab.imshow(image)
pylab.savefig('frame_1001_ffmpeg_only.png')

pipe.terminate()

This produces the correct result! 这会产生正确的结果！ (Correct timestamp 9:26:15) （正确的时间戳9:26:15）

frame_1001_ffmpeg_only.png: frame_1001_ffmpeg_only.png：

Additional information 附加信息

In the comments I was asked for my cvconfig.h file. 在评论中，我被要求提供我的cvconfig.h文件。 I only seem to have this file for cv2 version 3.1.0 under /opt/opencv/3.1.0/include/opencv2/cvconfig.h . 我似乎只在/opt/opencv/3.1.0/include/opencv2/cvconfig.h下有cv2版本3.1.0的这个文件。

HERE is a paste of this file. 这里是此文件的粘贴。

In case it helps, I was able to extract the following video information with VideoCapture.get . 如果有帮助，我可以使用VideoCapture.get提取以下视频信息。

brightness 0.0 亮度0.0
contrast 0.0 对比0.0
convert_rgb 0.0 convert_rgb 0.0
exposure 0.0 暴露0.0
format 0.0 格式0.0
fourcc 1684633187.0 fourcc 1684633187.0
fps 100.0 fps 100.0
frame_count 18000.0 frame_count 18000.0
frame_height 593.0 frame_height 593.0
frame_width 792.0 frame_width 792.0
gain 0.0 获得0.0
hue 0.0 色调0.0
mode 0.0 模式0.0
openni_baseline 0.0 openni_baseline 0.0
openni_focal_length 0.0 openni_focal_length 0.0
openni_frame_max_depth 0.0 openni_frame_max_depth 0.0
openni_output_mode 0.0 openni_output_mode 0.0
openni_registration 0.0 openni_registration 0.0
pos_avi_ratio 0.01 pos_avi_ratio 0.01
pos_frames 0.0 pos_frames 0.0
pos_msec 0.0 pos_msec 0.0
rectification 0.0 整改0.0
saturation 0.0 饱和度0.0

Answer 1

Your video file data contains just 1313 non-duplicate frames (ie between 7 and 8 frames per second of duration): 您的视频文件数据仅包含1313个非重复帧（即每秒7到8帧的持续时间）：

$ ffprobe -i demo.avi -loglevel fatal -show_streams -count_frames|grep frame
has_b_frames=0
r_frame_rate=100/1
avg_frame_rate=100/1
nb_frames=18000
nb_read_frames=1313        # !!!

Converting the avi file with ffmpeg reports 16697 duplicate frames (for some reason 10 additional frames are added and 16697=18010-1313). 使用ffmpeg转换avi文件报告16697个重复帧（由于某种原因，添加了10个额外的帧，并且16697 = 18010-1313）。

$ ffmpeg -i demo.avi demo.mp4
...
frame=18010 fps=417 Lsize=3705kB time=03:00.08 bitrate=168.6kbits/s dup=16697
#                                                                   ^^^^^^^^^
...

BTW, thus converted video ( demo.mp4 ) is devoid of the problem being discussed, that is OpenCV processes it correctly. BTW，因此转换后的视频（ demo.mp4 ）没有讨论的问题，即OpenCV正确处理它。

In this case the duplicate frames are not physically present in the avi file, instead each duplicate frame is represented by an instruction to repeat the previous frame. 在这种情况下，复制帧实际上不存在于avi文件中，而是每个复制帧由重复前一帧的指令表示。 This can be checked as follows: 这可以检查如下：

$ ffplay -loglevel trace demo.avi
...
[ffplay_crop @ 0x7f4308003380] n:16 t:2.180000 pos:1311818.000000 x:0 y:0 x+w:792 y+h:592
[avi @ 0x7f4310009280] dts:574 offset:574 1/100 smpl_siz:0 base:1000000 st:0 size:81266
video: delay=0.130 A-V=0.000094
    Last message repeated 9 times
video: delay=0.130 A-V=0.000095
video: delay=0.130 A-V=0.000094
video: delay=0.130 A-V=0.000095
[avi @ 0x7f4310009280] dts:587 offset:587 1/100 smpl_siz:0 base:1000000 st:0 size:81646
[ffplay_crop @ 0x7f4308003380] n:17 t:2.320000 pos:1393538.000000 x:0 y:0 x+w:792 y+h:592
video: delay=0.140 A-V=0.000091
    Last message repeated 4 times
video: delay=0.140 A-V=0.000092
    Last message repeated 1 times
video: delay=0.140 A-V=0.000091
    Last message repeated 6 times
...

In the above log, frames with actual data are represented by the lines starting with " [avi @ 0xHHHHHHHHHHH] ". 在上面的日志中，具有实际数据的帧由以“ [avi @ 0xHHHHHHHHHHH] ”开头的行表示。 The " video: delay=xxxxx AV=yyyyy " messages indicate that the last frame must be displayed for xxxxx more seconds. “ video: delay=xxxxx AV=yyyyy ”消息表示最后一帧必须再显示xxxxx秒。

cv2.VideoCapture() skips such duplicate frames, reading only frames that have real data. cv2.VideoCapture()跳过这样的重复帧，只读取具有真实数据的帧。 Here is the corresponding (though, slightly edited) code from the 2.4 branch of opencv (note, BTW, that underneath ffmpeg is used, which I verified by running python under gdb and setting a breakpoint on CvCapture_FFMPEG::grabFrame ): 这是来自opencv的2.4分支的相应（但略微编辑）代码（注意，BTW，在ffmpeg下使用，我通过在gdb下运行python并在CvCapture_FFMPEG::grabFrame上设置断点来CvCapture_FFMPEG::grabFrame ）：

bool CvCapture_FFMPEG::grabFrame()
{
    ...
    int count_errs = 0;
    const int max_number_of_attempts = 1 << 9; // !!!
    ...
    // get the next frame
    while (!valid)
    {
        ...
        int ret = av_read_frame(ic, &packet);
        ...        
        // Decode video frame
        avcodec_decode_video2(video_st->codec, picture, &got_picture, &packet);
        // Did we get a video frame?
        if(got_picture)
        {
            //picture_pts = picture->best_effort_timestamp;
            if( picture_pts == AV_NOPTS_VALUE_ )
                picture_pts = packet.pts != AV_NOPTS_VALUE_ && packet.pts != 0 ? packet.pts : packet.dts;
            frame_number++;
            valid = true;
        }
        else
        {
            // So, if the next frame doesn't have picture data but is
            // merely a tiny instruction telling to repeat the previous
            // frame, then we get here, treat that situation as an error
            // and proceed unless the count of errors exceeds 1 billion!!!
            if (++count_errs > max_number_of_attempts)
                break;
        }
    }
    ...
}

Answer 2

In a nutshell: I reproduced your problem on an Ubuntu 12.04 machine with OpenCV 2.4.13, noticed that the codec used in your video (FourCC CVID) seems to be rather old (according to this post from 2011), and after converting the video to codec MJPG (aka M-JPEG or Motion JPEG) your MCVE worked. 简而言之：我在使用OpenCV 2.4.13的Ubuntu 12.04机器上重现了您的问题，注意到您的视频中使用的编解码器（FourCC CVID）似乎相当陈旧（根据2011年的这篇文章），并在转换视频后编解码器MJPG（又名M-JPEG或Motion JPEG）你的MCVE工作。 Of course, Leon (or others) may post a fix for OpenCV, which may be the better solution for your case. 当然，Leon（或其他人）可能会发布OpenCV修复程序，这可能是您案例的更好解决方案。

I initially tried the conversion using 我最初尝试使用转换

ffmpeg -i demo.avi -vcodec mjpeg -an demo_mjpg.avi

and 和

avconv -i demo.avi -vcodec mjpeg -an demo_mjpg.avi

(both also on a 16.04 box). （两者也在16.04盒子上）。 Interestingly, both produced "broken" videos. 有趣的是，两者都产生了“破碎”的视频。 Eg, when jumping to frame 1000 using Avidemux, there in no real-time clock! 例如，当使用Avidemux跳到第1000帧时，没有实时时钟！ Also, the converted videos were only about 1/6 of the original size, which is strange since M-JPEG is a very simple compression. 此外，转换后的视频只有原始大小的1/6，这很奇怪，因为M-JPEG是一种非常简单的压缩。 (Each frame is JPEG-compressed independently.) （每帧都是独立的JPEG压缩。）

Using Avidemux to convert demo.avi to M-JPEG produced a video on which the MCVE worked. 使用Avidemux将demo.avi转换为M-JPEG，可以生成MCVE工作的视频。 (I used the Avidemux GUI for the conversion.) The size of the converted video is about 3x the original size. （我使用Avidemux GUI进行转换。）转换后视频的大小约为原始大小的3倍。 Of course, it may also be possible to do the original recording using a codec that is supported better on Linux. 当然，也可以使用Linux上更好支持的编解码器进行原始录制。 If you need to jump to specific frames in the video in your application, M-JPEG may be the best option. 如果您需要跳转到应用程序中视频中的特定帧，M-JPEG可能是最佳选择。 Otherwise, H.264 compresses much better. 否则，H.264压缩得更好。 Both are well-supported in my experience and the only codes I have seen implemented directly on webcams (H.264 only on high-end ones). 两者都得到了很好的支持，我的经验和我见过的唯一代码直接在网络摄像头上实现（仅限高端的H.264）。

Answer 3

As you said : 如你所说：

When using ffmpeg directly to read frames (credit to this tutorial) the correct output images are produced. 当直接使用ffmpeg读取帧时（相当于本教程），会生成正确的输出图像。

Is it normal because you define a framesize = resolution[0]*resolution[1]*3 这是正常的，因为你定义了一个framesize = resolution[0]*resolution[1]*3

then reuse it when read : pipe.stdout.read(framesize) 然后在读取时重用它： pipe.stdout.read(framesize)

So in my opinion you have to update each: 所以在我看来你必须更新每个：

_, frame = cap.read()

to 至

_, frame = cap.read(framesize)

Assuming the resolution is identical, final code version will be: 假设分辨率相同，最终代码版本将为：

import cv2

# set up capture and print properties
print 'cv2 version = {}'.format(cv2.__version__)
cap = cv2.VideoCapture('demo.avi')
fps = cap.get(cv2.cv.CV_CAP_PROP_FPS)
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('initial attributes: fps = {}, pos_msec = {}, pos_frames = {}'
      .format(fps, pos_msec, pos_frames))

resolution = (593, 792) #here resolution 
framesize = resolution[0]*resolution[1]*3 #here framesize

# get first frame and save as picture
_, frame = cap.read( framesize ) #update to get one frame
cv2.imwrite('first_frame.png', frame)

# advance 10 seconds, that's 100*10 = 1000 frames at 100 fps
for _ in range(1000):
    _, frame = cap.read( framesize ) #update to get one frame
    # in the actual code, the frame is now analyzed

# save a picture of the current frame
cv2.imwrite('after_iteration.png', frame)

# print properties after iteration
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('attributes after iteration: pos_msec = {}, pos_frames = {}'
      .format(pos_msec, pos_frames))

# assert that the capture (thinks it) is where it is supposed to be
# (assertions succeed)
assert pos_frames == 1000 + 1 # (+1: iteration started with second frame)
assert pos_msec == 10000 + 10

# manually set the capture to msec position 10010
# note that this should change absolutely nothing in theory
cap.set(cv2.cv.CV_CAP_PROP_POS_MSEC, 10010)

# print properties  again to be extra sure
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('attributes after setting msec pos manually: pos_msec = {}, pos_frames = {}'
      .format(pos_msec, pos_frames))

# save a picture of the next frame, should show the same clock as
# previously taken image - but does not
_, frame = cap.read()
cv2.imwrite('after_setting.png', frame)

OpenCV：VideoCapture的阅读框将视频推向奇怪的错误位置

问题描述

3 个解决方案

解决方案1
4 已采纳 2017-06-14 09:56:34

解决方案2
1 2017-06-14 17:21:51

解决方案3
0 2017-06-15 22:05:36

OpenCV：VideoCapture的阅读框将视频推向奇怪的错误位置

问题描述

3 个解决方案

解决方案1 4 已采纳 2017-06-14 09:56:34

解决方案2 1 2017-06-14 17:21:51

解决方案3 0 2017-06-15 22:05:36

解决方案1
4 已采纳 2017-06-14 09:56:34

解决方案2
1 2017-06-14 17:21:51

解决方案3
0 2017-06-15 22:05:36