[英]OpenCV: reading frames from VideoCapture advances the video to bizarrely wrong location
(I will put a 500 reputation bounty on this question as soon as it's eligible - unless the question got closed.) (一旦符合条件,我将在这个问题上立即获得500点声望奖励 - 除非问题已经结束。)
Problem in one sentence 一句话的问题
Reading frames from a VideoCapture
advances the video much further than it's supposed to. 从
VideoCapture
读取帧会使视频比预期的更进一步。
Explanation 说明
I need to read and analyze frames from a 100 fps (according to cv2
and VLC media player) video between certain time-intervals. 我需要在特定时间间隔内从100 fps(根据
cv2
和VLC媒体播放器)视频读取和分析帧。 In the minimal example that follows I am trying to read all the frames for the first ten seconds of a three minute video. 在下面的最小示例中,我试图读取三分钟视频的前十秒的所有帧。
I am creating a cv2.VideoCapture
object from which I read frames until the desired position in milliseconds is reached. 我正在创建一个
cv2.VideoCapture
对象,我从中读取帧,直到达到所需的位置(以毫秒为单位)。 In my actual code each frame is analyzed, but that fact is irrelevant in order to showcase the error. 在我的实际代码中,每个框架都进行了分析,但这一事实与展示错误无关。
Checking the current frame and millisecond position of the VideoCapture
after reading the frames yields correct values, so the VideoCapture
thinks it is at the right position - but it is not. 在读取帧后检查
VideoCapture
的当前帧和毫秒位置会产生正确的值,因此VideoCapture
认为它位于正确的位置 - 但事实并非如此。 Saving an image of the last read frame reveals that my iteration is grossly overshooting the destination time by over two minutes . 保存最后一个读取帧的图像显示我的迭代超过目标时间超过两分钟 。
What's even more bizarre is that if I manually set the millisecond position of the capture with VideoCapture.set
to 10 seconds (the same value VideoCapture.get
returns after reading the frames) and save an image, the video is at (almost) the right position! 更奇怪的是,如果我使用
VideoCapture.set
手动设置捕获的毫秒位置为10秒(读取帧后VideoCapture.get
返回相同值)并保存图像,则视频(几乎)正确位置!
Demo video file 演示视频文件
In case you want to run the MCVE, you need the demo.avi video file. 如果您想运行MCVE,则需要demo.avi视频文件。 You can download it HERE .
您可以下载它这里 。
MCVE MCVE
This MCVE is carefully crafted and commented. 这款MCVE经过精心设计和评论。 Please leave a comment under the question if anything remains unclear.
如果有任何不清楚的地方,请在问题下留言。
If you are using OpenCV 3 you have to replace all instances of cv2.cv.CV_
with cv2.
如果您使用的是OpenCV 3,则必须使用
cv2.
替换cv2.cv.CV_
所有实例cv2.
. 。 (The problem occurs in both versions for me.)
(对我来说,这两个版本都会出现问题。)
import cv2
# set up capture and print properties
print 'cv2 version = {}'.format(cv2.__version__)
cap = cv2.VideoCapture('demo.avi')
fps = cap.get(cv2.cv.CV_CAP_PROP_FPS)
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('initial attributes: fps = {}, pos_msec = {}, pos_frames = {}'
.format(fps, pos_msec, pos_frames))
# get first frame and save as picture
_, frame = cap.read()
cv2.imwrite('first_frame.png', frame)
# advance 10 seconds, that's 100*10 = 1000 frames at 100 fps
for _ in range(1000):
_, frame = cap.read()
# in the actual code, the frame is now analyzed
# save a picture of the current frame
cv2.imwrite('after_iteration.png', frame)
# print properties after iteration
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('attributes after iteration: pos_msec = {}, pos_frames = {}'
.format(pos_msec, pos_frames))
# assert that the capture (thinks it) is where it is supposed to be
# (assertions succeed)
assert pos_frames == 1000 + 1 # (+1: iteration started with second frame)
assert pos_msec == 10000 + 10
# manually set the capture to msec position 10010
# note that this should change absolutely nothing in theory
cap.set(cv2.cv.CV_CAP_PROP_POS_MSEC, 10010)
# print properties again to be extra sure
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('attributes after setting msec pos manually: pos_msec = {}, pos_frames = {}'
.format(pos_msec, pos_frames))
# save a picture of the next frame, should show the same clock as
# previously taken image - but does not
_, frame = cap.read()
cv2.imwrite('after_setting.png', frame)
MCVE output MCVE输出
The print
statements produce the following output. print
语句产生以下输出。
cv2 version = 2.4.9.1
cv2版本= 2.4.9.1
initial attributes: fps = 100.0, pos_msec = 0.0, pos_frames = 0.0初始属性:fps = 100.0,pos_msec = 0.0,pos_frames = 0.0
attributes after reading: pos_msec = 10010.0, pos_frames = 1001.0读取后的属性:pos_msec = 10010.0,pos_frames = 1001.0
attributes after setting msec pos manually: pos_msec = 10010.0, pos_frames = 1001.0手动设置msec pos后的属性:pos_msec = 10010.0,pos_frames = 1001.0
As you can see, all properties have the expected values. 如您所见,所有属性都具有预期值。
imwrite
saves the following pictures. imwrite
保存以下图片。
first_frame.png first_frame.png
after_iteration.png after_iteration.png
after_setting.png after_setting.png
You can see the problem in the second picture. 您可以在第二张图片中看到问题。 The target of 9:26:15 (real time clock in picture) is missed by over two minutes.
9:26:15(图中实时时钟)的目标错过了超过两分钟。 Setting the target time manually (third picture) sets the video to (almost) the correct position.
手动设置目标时间(第三张图片)将视频设置为(几乎)正确的位置。
What am I doing wrong and how do I fix it? 我做错了什么,我该如何解决?
Tried so far 到目前为止尝试过
cv2 2.4.9.1 @ Ubuntu 16.04 cv2 2.4.9.1 @ Ubuntu 16.04
cv2 2.4.13 @ Scientific Linux 7.3 (three computers) cv2 2.4.13 @ Scientific Linux 7.3(三台电脑)
cv2 3.1.0 @ Scientific Linux 7.3 (three computers) cv2 3.1.0 @ Scientific Linux 7.3(三台电脑)
Creating the capture with 使用创建捕获
cap = cv2.VideoCapture('demo.avi', apiPreference=cv2.CAP_FFMPEG)
and 和
cap = cv2.VideoCapture('demo.avi', apiPreference=cv2.CAP_GSTREAMER)
in OpenCV 3 (version 2 does not seem to have the apiPreference
argument). 在OpenCV 3中(版本2似乎没有
apiPreference
参数)。 Using cv2.CAP_GSTREAMER
takes extremely long (about 2-3 minutes to run the MCVE) but both api-preferences produce the same incorrect images. 使用
cv2.CAP_GSTREAMER
需要非常长的时间(运行MCVE大约需要2-3分钟),但两个api-preferences都会产生相同的错误图像。
When using ffmpeg
directly to read frames (credit to this tutorial) the correct output images are produced. 当直接使用
ffmpeg
读取帧时( ffmpeg
本教程),会生成正确的输出图像。
import numpy as np
import subprocess as sp
import pylab
# video properties
path = './demo.avi'
resolution = (593, 792)
framesize = resolution[0]*resolution[1]*3
# set up pipe
FFMPEG_BIN = "ffmpeg"
command = [FFMPEG_BIN,
'-i', path,
'-f', 'image2pipe',
'-pix_fmt', 'rgb24',
'-vcodec', 'rawvideo', '-']
pipe = sp.Popen(command, stdout = sp.PIPE, bufsize=10**8)
# read first frame and save as image
raw_image = pipe.stdout.read(framesize)
image = np.fromstring(raw_image, dtype='uint8')
image = image.reshape(resolution[0], resolution[1], 3)
pylab.imshow(image)
pylab.savefig('first_frame_ffmpeg_only.png')
pipe.stdout.flush()
# forward 1000 frames
for _ in range(1000):
raw_image = pipe.stdout.read(framesize)
pipe.stdout.flush()
# save frame 1001
image = np.fromstring(raw_image, dtype='uint8')
image = image.reshape(resolution[0], resolution[1], 3)
pylab.imshow(image)
pylab.savefig('frame_1001_ffmpeg_only.png')
pipe.terminate()
This produces the correct result! 这会产生正确的结果! (Correct timestamp 9:26:15)
(正确的时间戳9:26:15)
frame_1001_ffmpeg_only.png: frame_1001_ffmpeg_only.png:
Additional information 附加信息
In the comments I was asked for my cvconfig.h
file. 在评论中,我被要求提供我的
cvconfig.h
文件。 I only seem to have this file for cv2 version 3.1.0 under /opt/opencv/3.1.0/include/opencv2/cvconfig.h
. 我似乎只在
/opt/opencv/3.1.0/include/opencv2/cvconfig.h
下有cv2版本3.1.0的这个文件。
HERE is a paste of this file. 这里是此文件的粘贴。
In case it helps, I was able to extract the following video information with VideoCapture.get
. 如果有帮助,我可以使用
VideoCapture.get
提取以下视频信息。
brightness 0.0
亮度0.0
contrast 0.0对比0.0
convert_rgb 0.0convert_rgb 0.0
exposure 0.0暴露0.0
format 0.0格式0.0
fourcc 1684633187.0fourcc 1684633187.0
fps 100.0fps 100.0
frame_count 18000.0frame_count 18000.0
frame_height 593.0frame_height 593.0
frame_width 792.0frame_width 792.0
gain 0.0获得0.0
hue 0.0色调0.0
mode 0.0模式0.0
openni_baseline 0.0openni_baseline 0.0
openni_focal_length 0.0openni_focal_length 0.0
openni_frame_max_depth 0.0openni_frame_max_depth 0.0
openni_output_mode 0.0openni_output_mode 0.0
openni_registration 0.0openni_registration 0.0
pos_avi_ratio 0.01pos_avi_ratio 0.01
pos_frames 0.0pos_frames 0.0
pos_msec 0.0pos_msec 0.0
rectification 0.0整改0.0
saturation 0.0饱和度0.0
Your video file data contains just 1313 non-duplicate frames (ie between 7 and 8 frames per second of duration): 您的视频文件数据仅包含1313个非重复帧(即每秒7到8帧的持续时间):
$ ffprobe -i demo.avi -loglevel fatal -show_streams -count_frames|grep frame
has_b_frames=0
r_frame_rate=100/1
avg_frame_rate=100/1
nb_frames=18000
nb_read_frames=1313 # !!!
Converting the avi file with ffmpeg
reports 16697 duplicate frames (for some reason 10 additional frames are added and 16697=18010-1313). 使用
ffmpeg
转换avi文件报告16697个重复帧(由于某种原因,添加了10个额外的帧,并且16697 = 18010-1313)。
$ ffmpeg -i demo.avi demo.mp4
...
frame=18010 fps=417 Lsize=3705kB time=03:00.08 bitrate=168.6kbits/s dup=16697
# ^^^^^^^^^
...
BTW, thus converted video (
demo.mp4
) is devoid of the problem being discussed, that is OpenCV processes it correctly.BTW,因此转换后的视频(
demo.mp4
)没有讨论的问题,即OpenCV正确处理它。
In this case the duplicate frames are not physically present in the avi file, instead each duplicate frame is represented by an instruction to repeat the previous frame. 在这种情况下,复制帧实际上不存在于avi文件中,而是每个复制帧由重复前一帧的指令表示。 This can be checked as follows:
这可以检查如下:
$ ffplay -loglevel trace demo.avi
...
[ffplay_crop @ 0x7f4308003380] n:16 t:2.180000 pos:1311818.000000 x:0 y:0 x+w:792 y+h:592
[avi @ 0x7f4310009280] dts:574 offset:574 1/100 smpl_siz:0 base:1000000 st:0 size:81266
video: delay=0.130 A-V=0.000094
Last message repeated 9 times
video: delay=0.130 A-V=0.000095
video: delay=0.130 A-V=0.000094
video: delay=0.130 A-V=0.000095
[avi @ 0x7f4310009280] dts:587 offset:587 1/100 smpl_siz:0 base:1000000 st:0 size:81646
[ffplay_crop @ 0x7f4308003380] n:17 t:2.320000 pos:1393538.000000 x:0 y:0 x+w:792 y+h:592
video: delay=0.140 A-V=0.000091
Last message repeated 4 times
video: delay=0.140 A-V=0.000092
Last message repeated 1 times
video: delay=0.140 A-V=0.000091
Last message repeated 6 times
...
In the above log, frames with actual data are represented by the lines starting with " [avi @ 0xHHHHHHHHHHH]
". 在上面的日志中,具有实际数据的帧由以“
[avi @ 0xHHHHHHHHHHH]
”开头的行表示。 The " video: delay=xxxxx AV=yyyyy
" messages indicate that the last frame must be displayed for xxxxx
more seconds. “
video: delay=xxxxx AV=yyyyy
”消息表示最后一帧必须再显示xxxxx
秒。
cv2.VideoCapture()
skips such duplicate frames, reading only frames that have real data. cv2.VideoCapture()
跳过这样的重复帧,只读取具有真实数据的帧。 Here is the corresponding (though, slightly edited) code from the 2.4 branch of opencv (note, BTW, that underneath ffmpeg is used, which I verified by running python under gdb and setting a breakpoint on CvCapture_FFMPEG::grabFrame
): 这是来自opencv的2.4分支的相应(但略微编辑) 代码 (注意,BTW,在ffmpeg下使用,我通过在gdb下运行python并在
CvCapture_FFMPEG::grabFrame
上设置断点来CvCapture_FFMPEG::grabFrame
):
bool CvCapture_FFMPEG::grabFrame()
{
...
int count_errs = 0;
const int max_number_of_attempts = 1 << 9; // !!!
...
// get the next frame
while (!valid)
{
...
int ret = av_read_frame(ic, &packet);
...
// Decode video frame
avcodec_decode_video2(video_st->codec, picture, &got_picture, &packet);
// Did we get a video frame?
if(got_picture)
{
//picture_pts = picture->best_effort_timestamp;
if( picture_pts == AV_NOPTS_VALUE_ )
picture_pts = packet.pts != AV_NOPTS_VALUE_ && packet.pts != 0 ? packet.pts : packet.dts;
frame_number++;
valid = true;
}
else
{
// So, if the next frame doesn't have picture data but is
// merely a tiny instruction telling to repeat the previous
// frame, then we get here, treat that situation as an error
// and proceed unless the count of errors exceeds 1 billion!!!
if (++count_errs > max_number_of_attempts)
break;
}
}
...
}
In a nutshell: I reproduced your problem on an Ubuntu 12.04 machine with OpenCV 2.4.13, noticed that the codec used in your video (FourCC CVID) seems to be rather old (according to this post from 2011), and after converting the video to codec MJPG (aka M-JPEG or Motion JPEG) your MCVE worked. 简而言之:我在使用OpenCV 2.4.13的Ubuntu 12.04机器上重现了您的问题,注意到您的视频中使用的编解码器(FourCC CVID)似乎相当陈旧(根据2011年的这篇文章 ),并在转换视频后编解码器MJPG(又名M-JPEG或Motion JPEG)你的MCVE工作。 Of course, Leon (or others) may post a fix for OpenCV, which may be the better solution for your case.
当然,Leon(或其他人)可能会发布OpenCV修复程序,这可能是您案例的更好解决方案。
I initially tried the conversion using 我最初尝试使用转换
ffmpeg -i demo.avi -vcodec mjpeg -an demo_mjpg.avi
and 和
avconv -i demo.avi -vcodec mjpeg -an demo_mjpg.avi
(both also on a 16.04 box). (两者也在16.04盒子上)。 Interestingly, both produced "broken" videos.
有趣的是,两者都产生了“破碎”的视频。 Eg, when jumping to frame 1000 using Avidemux, there in no real-time clock!
例如,当使用Avidemux跳到第1000帧时,没有实时时钟! Also, the converted videos were only about 1/6 of the original size, which is strange since M-JPEG is a very simple compression.
此外,转换后的视频只有原始大小的1/6,这很奇怪,因为M-JPEG是一种非常简单的压缩。 (Each frame is JPEG-compressed independently.)
(每帧都是独立的JPEG压缩。)
Using Avidemux to convert demo.avi
to M-JPEG produced a video on which the MCVE worked. 使用Avidemux将
demo.avi
转换为M-JPEG,可以生成MCVE工作的视频。 (I used the Avidemux GUI for the conversion.) The size of the converted video is about 3x the original size. (我使用Avidemux GUI进行转换。)转换后视频的大小约为原始大小的3倍。 Of course, it may also be possible to do the original recording using a codec that is supported better on Linux.
当然,也可以使用Linux上更好支持的编解码器进行原始录制。 If you need to jump to specific frames in the video in your application, M-JPEG may be the best option.
如果您需要跳转到应用程序中视频中的特定帧,M-JPEG可能是最佳选择。 Otherwise, H.264 compresses much better.
否则,H.264压缩得更好。 Both are well-supported in my experience and the only codes I have seen implemented directly on webcams (H.264 only on high-end ones).
两者都得到了很好的支持,我的经验和我见过的唯一代码直接在网络摄像头上实现(仅限高端的H.264)。
As you said : 如你所说 :
When using ffmpeg directly to read frames (credit to this tutorial) the correct output images are produced.
当直接使用ffmpeg读取帧时(相当于本教程),会生成正确的输出图像。
Is it normal because you define a framesize = resolution[0]*resolution[1]*3
这是正常的,因为你定义了一个
framesize = resolution[0]*resolution[1]*3
then reuse it when read : pipe.stdout.read(framesize)
然后在读取时重用它:
pipe.stdout.read(framesize)
So in my opinion you have to update each: 所以在我看来你必须更新每个:
_, frame = cap.read()
to 至
_, frame = cap.read(framesize)
Assuming the resolution is identical, final code version will be: 假设分辨率相同,最终代码版本将为:
import cv2
# set up capture and print properties
print 'cv2 version = {}'.format(cv2.__version__)
cap = cv2.VideoCapture('demo.avi')
fps = cap.get(cv2.cv.CV_CAP_PROP_FPS)
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('initial attributes: fps = {}, pos_msec = {}, pos_frames = {}'
.format(fps, pos_msec, pos_frames))
resolution = (593, 792) #here resolution
framesize = resolution[0]*resolution[1]*3 #here framesize
# get first frame and save as picture
_, frame = cap.read( framesize ) #update to get one frame
cv2.imwrite('first_frame.png', frame)
# advance 10 seconds, that's 100*10 = 1000 frames at 100 fps
for _ in range(1000):
_, frame = cap.read( framesize ) #update to get one frame
# in the actual code, the frame is now analyzed
# save a picture of the current frame
cv2.imwrite('after_iteration.png', frame)
# print properties after iteration
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('attributes after iteration: pos_msec = {}, pos_frames = {}'
.format(pos_msec, pos_frames))
# assert that the capture (thinks it) is where it is supposed to be
# (assertions succeed)
assert pos_frames == 1000 + 1 # (+1: iteration started with second frame)
assert pos_msec == 10000 + 10
# manually set the capture to msec position 10010
# note that this should change absolutely nothing in theory
cap.set(cv2.cv.CV_CAP_PROP_POS_MSEC, 10010)
# print properties again to be extra sure
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('attributes after setting msec pos manually: pos_msec = {}, pos_frames = {}'
.format(pos_msec, pos_frames))
# save a picture of the next frame, should show the same clock as
# previously taken image - but does not
_, frame = cap.read()
cv2.imwrite('after_setting.png', frame)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.