简体   繁体   中英

Decode a single (H264) packet dumped by ffmpeg from a mp4 file

I used ffmpeg to dump a packet representing a single frame from an h264 video inside a mp4

ffmpeg -i video.mp4 -c copy -vframes 1 -map 0:v:0 -f data frame.bin

The data inside frame.bin seems to be fine and seems to be composed of exactly the same bytes as the first chunk/packet (I'm not sure what is the correct term) from the mdat atom.

Now I want to decode that frame. Since I know the codec that was used to create that packet (h264), I thought I could simply prepare a codec context, load all that data into a packet and the use traditional avcodec_send_packet(codecContext, packet) followed by avcodec_receive_frame() combo.

Unfortunately the call to avcodec_send_packet fails and I receive the following error

(-1094995529) Invalid data found when processing input

Since the first 4 bytes of the packet data is the size of the packet itself, I tried skipping those bytes before passing the buffer to the packet, but that also failed.

Am I skipping some step or doing something wrong? Is what I am trying to do even possible? (please say yes :)

ITU-T Rec. H.264 & Annex B

Recommendation H.264 is a video codec standard defined by the International Telecommunication Union, T elecommunications Standardization Sector (ITU- T ). It is available free of charge and can be downloaded from their website .

The standard defines a bytestream format, whose lowest level of abstraction is the NALU (Network Layer Abstraction Unit).

32 types of NALUs can exist, although about 11 are reserved or unused. Some carry video slice data, some don't. Two NALU types will be important later in this discussion: SPS (Sequence Parameter Set) and PPS (Picture Parameter Set). Both are required to decode a video slice, and provide important information about the stream, such as its size and interpretation of the raw data.

H.264 leaves undefined how these NALUs are transported and framed. However, it does describe one possible scheme, in the Standard's own Annex B. This scheme, for want of a better name, is generally referred to as Annex B .

The scheme consists in prefixing the NALUs with an easy-to-synchronize-to start code that cannot occur within a NALU: A 3- or 4-byte pattern 00 00 01 or 00 00 00 01 . The rest of the NALU then follows. This scheme is popular in hardware and/or streaming situations because it allows acquiring bit-lock and byte-alignment easily, sends the SPS/PPS “in-band” periodically and thus allows one to tune into the stream at a random point to begin decoding, and has the interesting property that between NALUs one can validly send an arbitrary number of 0 bits or bytes.

ISO/IEC 14496 MPEG-4 & AVCC

MPEG-4 is a family in multiple “parts” of standards for Audio-Video coding and storage made by a joint group of the International Standards Organization (ISO) and International Electrotechnical Commission (IEC) called the Moving Pictures Expert Group (MPEG). A few parts only of the MPEG-4 family are relevant:

  • MPEG-4 Part 10 / Advanced Video Coding (AVC), technically identical to ITU-T H.264. Free of charge .
  • MPEG-4 Part 12, ISO Base Media File Format (BMFF), defines a generic binary container file format that can be specialized. Free of charge .
  • MPEG-4 Part 14 (MP4), which specializes Part 12 for video in general and defines the .mp4 file extension and format. This part is very expensive ( 88 Swiss francs ) and not available to the public.
  • MPEG-4 Part 15, which defines how NALU-structured video data such as Part 10/H.264 video is stored in the Part 12 ISO BMFF. This part is extremely expensive ( 198 Swiss francs ), and not available to the public, but it, Part 14, 12 and 10 are the basis of the commonly-used .mp4 container with H.264-coded video.

AVCC

Unfortunately, Part 15 is also the part that defines a new scheme for framing of NALUs. This scheme proposes to extract all SPS/PPS NALUs into an “out-of-band” structure called AVCC , and also strips and replaces the start code prefixes in front of NALUs by an (almost-always) 4-byte number representing the size, in bytes, of the following NALU.

This scheme is popular for fast- and random-seeking through video data, and by gathering all video decoder configuration data (SPS/PPS) in one standardized place, one can configure the video decoder once at the beginning and thereafter not worry about unexpected surprises like a dynamic change in the size of the video frame (which Annex B allows).

Fortunately, hints about AVCC's structure exist online, as does code to translate between AVCC and Annex B.

Your needs

You seem to need AVCC -> Annex B conversion. This can be done with FFmpeg's bitstream filter, h264_mp4toannexb :

ffmpeg -i INPUT.mp4 -codec copy -bsf:v h264_mp4toannexb OUTPUT.ts

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM