简体   繁体   中英

How does a H264 stream decoder decides which type of stream is being fed by H264 encoder

There are two types of stream types supported or at-least recommended by the ITU-T H264 documentation ie RTP packet and Annex B (Raw Byte Sequence).

My question here is that lets assume that the encoder is capable of sending streaming data in both the formats and can switch between anyone of them at any point of time while streaming(correct if this is not the case), how and when does the H264 decoder comes to know that it needs to parse the data either according to RTP format or Annex B ie Raw Byte Sequence data.

Is there any standard protocol or mechanism to do that.

What will happen in case there is packet loss and the encoder switches the way it was streaming data ie either from RTP to Annex B or vice versa, here the decoder probably still assumes the data being streamed in old format.

Kindly clarify the above.

Generally, most of the cases, H264 encoders produce packets in NAL (Netwrok Abstraction Layer) form. Each NALU (NAL Unit) consists of a NAL-Header and RBSP (Raw Byte Sequence Payload). As similar to H264 encoders, most of the decoders are capable of understanding the NALU (not really RTP). NAL header is 1-byte in size.

There are 2 RTP packetization methods for NAL Units. In one method, NAL fragmentation is allowed and other method doesn't allow fragmenting the NALU. In both methods, RTP header is followed by NALU. Suppose both encoders and decoders are implemented in a way to understand RTP header as well, then they should parse the header first as the headers are always fixed in size. Then, check against RTP and/or NAL headers to treat it accordingly for further parsing.

For more details, see RFC 6184 - RTP Payload Format for H.264 Video

In summary, RTP and NAL are just headers and it's about the method to parse RTP or NAL header before decoding the actual video data. It is better to signal the mode (RTP or NAL) in which the data got streamed to decoder. That makes decoder life easy to avoid mistake of treating any packet wrongly.

In case of packet loss, it is all about decoder resiliency approaches. There is no standardized approach for packet (NALU) loss. Some decoders do provide error concealment for packet loss scenarios.

More Details Added:

You need to have both header (RTP & NAL) parsing implementations on decoders side. As said above, it is better to have a signalling mechanism to indicate the mode in which the packet is sent to decoder. Since NAL header is subset in a given packet (exists in RTP and NAL), you better search for NAL start code first. Once decoder finds the start code in a packet, check for number of bytes (x) consumed till that point. If x is greater than RTP header size, start parsing in RTP mode from the beginning of the packet. If RTP parsing goes well (by validating some of the RTP fields against data in hand), decoder can conclude that packets are getting received in RTP mode. Above approach is valid for Non-fragmented RTP packetization method.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM