简体繁体 English

h.264压缩视频中的对象跟踪

[英]Object Tracking in h.264 compressed video

原文 2013-11-13 18:57:48 1 3 c++/ video/ h.264/ beagleboard

I am working on a project that requires me to detect and track a human in a live video from a webcam connected to a Beagleboard xm. 我正在从事一个项目，该项目需要我从连接到Beagleboard xm的网络摄像头的实时视频中检测并跟踪人类。 I have completed this task using Opencv in pixel domain. 我已经在像素域中使用Opencv完成了此任务。 The results on the board are very accurate but extremely slow. 板上的结果非常准确，但是非常慢。 Many people have suggested me to leave pixel domain and do the same task in an h.264/MPEG-4 compressed video as it would extremely reduce the computational overhead. 许多人建议我离开像素域，并在h.264 / MPEG-4压缩视频中执行相同的任务，因为这将极大地减少计算开销。 I have read many research papers but failed to discover any software platform or a library that I can use to analyze and process h.264 compressed videos. 我已经阅读了许多研究论文，但是没有发现任何可用于分析和处理h.264压缩视频的软件平台或库 。 I will be thankful if someone can suggest me some library for h.264 compressed video analysis and guide me further. 如果有人可以向我建议一些用于h.264压缩视频分析的库，并为我提供进一步的指导，我将不胜感激。

Thanks and Regards. 谢谢并恭祝安康。

3 个解决方案

I'm not sure how practical this really is (I've never tried to do it), but my guess would be that what they're referring to would be looking for a block of macro-blocks that all have (nearly) identical motion vectors. 我不确定这实际上有多实用（我从未尝试过这样做），但是我猜想他们所指的是正在寻找一个（几乎）完全相同的宏块。运动矢量。

For example, let's assume you have a camera that's not panning, and the picture shows a car driving across the screen. 例如，假设您有一个没有平移的相机，并且图片显示了一辆正在屏幕上行驶的汽车。 Looking at the motion vectors, you should have a (roughly) car-shaped bunch of macro-blocks that all have similar motion vectors (denoting the motion of the car). 查看运动矢量，您应该具有（大致）汽车形状的一堆宏块，这些宏块都具有相似的运动矢量（表示汽车的运动）。 Then, rather than look at the entire picture for your object of interest, you can look at that block in isolation and try to identify it. 然后，您可以单独查看该块并尝试识别它，而不是查看感兴趣对象的整个图片。 Likewise, if the camera was panning with the car, you'd have a car-shaped block with small motion vectors, and most of the background would have similar motion vectors in the opposite direction of the car's movement. 同样，如果摄像头正在与汽车平移，则您将拥有一个带有小的运动矢量的汽车形块，并且大多数背景在汽车运动的相反方向上将具有相似的运动矢量。

Note, however, that this is likely to be imprecise at best. 但是请注意，这充其量可能是不精确的。 Just for example, let's assume our mythical car as driving in front of a brick building, with its headlights illuminating some of the bricks. 举例来说，让我们假设我们的神话般的汽车在砖瓦建筑前行驶，其前灯照亮了一些砖瓦。 In this case, a brick in one picture might (easily) not point back at the same brick in the previous picture, but instead point at the brick in the previous picture that happened to be illuminated about the same. 在这种情况下，一张图片中的砖块可能（不容易）不指向上一张图片中的同一块砖，而是指向上一张图片中恰巧被照亮的砖块。 The bricks are enough alike that the closest match will depend more on illumination than the brick itself. 砖块足够相似，最接近的匹配将比砖块本身更多地取决于照明。

You may be able, eventually, to parse and determine that h.264 has an object, but this will not be "object tracking" like your looking for. 最终，您可能能够解析并确定h.264具有对象，但这不会像您所寻找的那样“对象跟踪”。 openCV is excellent software and what it does best. openCV是一款出色的软件，它是最出色的软件。 Have you considered scaling the video down to a smaller resolution for easier analysis by openCV? 您是否考虑过将视频缩小到较小的分辨率，以便通过openCV进行更轻松的分析？

I think you are highly over estimating the computing power of this $45 computer. 我认为您对这台45美元的计算机的计算能力估计过高。 Object recognition and tracking is VERY hard computationally speaking. 在计算上，对象识别和跟踪非常困难。 I would start by seeing how many frames per second your board can track and optimize from there. 首先，我要看板每秒可以跟踪多少帧并从那里进行优化。 Start looking at where your bottlenecks are, you may be better off processing raw video instead of having to decode h.264 video first. 开始查看您的瓶颈所在，最好处理原始视频，而不必先解码h.264视频。 Again, RAW video takes a LOT of RAM, and processing through that takes a LOT of CPU. 同样，RAW视频需要占用大量RAM，而处理该视频需要占用大量CPU。

Minimize overhead from decoding video, minimize RAM overhead by scaling down the video before analysis, but in the end, your asking a LOT from a 1ghz, 32bit ARM processor. 通过在分析之前按比例缩小视频，可以最大程度地减少解码视频的开销，并通过缩减视频的大小来最大程度地减少RAM开销，但最后，您要向1GHz，32位ARM处理器请求LOT。

FFMPEG is a very old library that is not being supported now a days. FFMPEG是一个非常老的库，如今已经不被支持。 It has very limited capabilities in terms of processing and object tracking in h.264 compressed video. 就h.264压缩视频中的处理和对象跟踪而言，它的功能非常有限。 Most of the commands usually are outdated. 大多数命令通常是过时的。 The best thing would be to study h.264 thoroughly and then try to implement your own API in some language like Java or c#. 最好的办法是彻底研究h.264，然后尝试以某种语言（例如Java或c＃）实现自己的API。