如何将Android MediaCodec编码的H264打包成RTP包

Question

How can I properly pack a H264 byte stream into RTP packets so I can receive frames with FFMPEG?如何将 H264 字节流正确打包到 RTP 数据包中，以便我可以使用 FFMPEG 接收帧？

When I start the FFMPEG receiver, it pumps out a lot of errors like these:当我启动 FFMPEG 接收器时，它会产生很多这样的错误：

Invalid UE golomb code
[h264 @ 0xd63060] pps_id 3199971767 out of range
[h264 @ 0xd63060] slice type 32 too large at -1
[h264 @ 0xd63060] decode_slice_header error
[h264 @ 0xd63060] non-existing PPS 0 referenced
[h264 @ 0xd63060] decode_slice_header error
[h264 @ 0xd63060] no frame!
[h264 @ 0xd63060] decode_slice_header error
[h264 @ 0xd63060] Unknown NAL code: 0 (0 bits)
[h264 @ 0xd63060] no frame!
[h264 @ 0xd63060] non-existing PPS 0 referenced

Here is the SDP file I use:这是我使用的 SDP 文件：

c=IN IP4 192.168.2.30
t=0 0
m=video 51372 RTP/AVP 96
a=rtpmap:96 H264/90000
a=recv only

The pps_id error is curious, its as if its looking for the next PPS, but can't find it, although I tried embedding the PPS into each NALU. pps_id 错误很奇怪，它好像在寻找下一个 PPS，但找不到它，尽管我尝试将 PPS 嵌入到每个 NALU 中。

I've been reading RFC 6184 and trying to understand it.我一直在阅读RFC 6184并试图理解它。 But I feel I still don't quite understand how H264 and RTP interact.但是我感觉我还是不太明白H264和RTP是如何交互的。 Currently I'm trying to encode pixels from a camera and stream 1920x1080 H264 encoded frames through RTP across the network where it is then received by FFMPEG and decoded.目前，我正在尝试对来自相机的像素进行编码，并通过网络上的 RTP 传输 1920x1080 H264 编码的帧，然后由 FFMPEG 接收并解码。 I'm assembling the RTP and FU-A headers in Java and fragmenting the NALU when they are to large for the MTU.我正在用 Java 组装 RTP 和 FU-A 标头，并在它们对于 MTU 来说太大时将 NALU 分段。

I've been watching the stream closely in Wireshark, here is the output of my first packet:我一直在 Wireshark 中密切关注流，这是我的第一个数据包的输出：

Real-Time Transport Protocol
10.. .... = Version: RFC 1889 Version (2)
..0. .... = Padding: False
...0 .... = Extension: False
.... 0000 = Contributing source identifiers count: 0
1... .... = Marker: True
Payload type: DynamicRTP-Type-96 (96)
Sequence number: 0
Timestamp: 2727179012
Synchronization Source identifier: 0x00000000 (0)
H.264
NAL unit header or first byte of the payload
    0... .... = F bit: No bit errors or other syntax violations
    .00. .... = Nal_ref_idc (NRI): 0
    ...0 0000 = Type: Undefined (0)
H264 NAL Unit Payload

I don't understand why the first payload has the the NALU type of 0. Nevertheless, here is my second packet:我不明白为什么第一个有效负载的 NALU 类型为 0。不过，这是我的第二个数据包：

Real-Time Transport Protocol
10.. .... = Version: RFC 1889 Version (2)
..0. .... = Padding: False
...0 .... = Extension: False
.... 0000 = Contributing source identifiers count: 0
0... .... = Marker: False
Payload type: DynamicRTP-Type-96 (96)
Sequence number: 1
Timestamp: 2727179019
Synchronization Source identifier: 0x00000000 (0)
H.264
FU identifier
    0... .... = F bit: No bit errors or other syntax violations
    .11. .... = Nal_ref_idc (NRI): 3
    ...1 1100 = Type: Fragmentation unit A (FU-A) (28)
FU Header
    1... .... = Start bit: the first packet of FU-A picture
    .0.. .... = End bit: Not the last packet of FU-A picture
    ..0. .... = Forbidden bit: 0
    ...0 0101 = Nal_unit_type: Coded slice of an IDR picture (5)
H264 NAL Unit Payload
    0000 0000  0000 0000  0000 0000  0000 0001  0110 0101  1011 1000  0000 0100  0000 010. = first_mb_in_slice: 3000762881
    .... ...1 = slice_type: P (P slice) (0)
    0011 1... = pic_parameter_set_id: 6

So I think the last packet was a I-Frame?所以我认为最后一个数据包是 I 帧？ Here is a fragment between the start and end fragments:这是开始和结束片段之间的片段：

Real-Time Transport Protocol
10.. .... = Version: RFC 1889 Version (2)
..0. .... = Padding: False
...0 .... = Extension: False
.... 0000 = Contributing source identifiers count: 0
0... .... = Marker: False
Payload type: DynamicRTP-Type-96 (96)
Sequence number: 1
Timestamp: 2727179019
Synchronization Source identifier: 0x00000000 (0)
H.264
FU identifier
    0... .... = F bit: No bit errors or other syntax violations
    .11. .... = Nal_ref_idc (NRI): 3
    ...1 1100 = Type: Fragmentation unit A (FU-A) (28)
FU Header
    0... .... = Start bit: Not the first packet of FU-A picture
    .0.. .... = End bit: Not the last packet of FU-A picture
    ..0. .... = Forbidden bit: 0
    ...0 0101 = Nal_unit_type: Coded slice of an IDR picture (5)

And of course here is the last packet of the supposed I-Frame:当然，这里是假设的 I-Frame 的最后一个数据包：

Real-Time Transport Protocol
10.. .... = Version: RFC 1889 Version (2)
..0. .... = Padding: False
...0 .... = Extension: False
.... 0000 = Contributing source identifiers count: 0
1... .... = Marker: True
Payload type: DynamicRTP-Type-96 (96)
Sequence number: 1
Timestamp: 2727179019
Synchronization Source identifier: 0x00000000 (0)
H.264
FU identifier
    0... .... = F bit: No bit errors or other syntax violations
    .11. .... = Nal_ref_idc (NRI): 3
    ...1 1100 = Type: Fragmentation unit A (FU-A) (28)
FU Header
    0... .... = Start bit: Not the first packet of FU-A picture
    .1.. .... = End bit: the last packet of FU-A picture
    ..0. .... = Forbidden bit: 0
    ...0 0101 = Nal_unit_type: Coded slice of an IDR picture (5)

Now here is my packet for the next bytes the encoder gave me:现在这是编码器给我的下一个字节的数据包：

Real-Time Transport Protocol
10.. .... = Version: RFC 1889 Version (2)
..0. .... = Padding: False
...0 .... = Extension: False
.... 0000 = Contributing source identifiers count: 0
0... .... = Marker: False
Payload type: DynamicRTP-Type-96 (96)
Sequence number: 2
Timestamp: 2727179089
Synchronization Source identifier: 0x00000000 (0)
H.264
FU identifier
    0... .... = F bit: No bit errors or other syntax violations
    .11. .... = Nal_ref_idc (NRI): 3
    ...1 1100 = Type: Fragmentation unit A (FU-A) (28)
FU Header
    1... .... = Start bit: the first packet of FU-A picture
    .0.. .... = End bit: Not the last packet of FU-A picture
    ..0. .... = Forbidden bit: 0
    ...0 0001 = Nal_unit_type: Coded slice of a non-IDR picture (1)
H264 NAL Unit Payload
    0000 0000  0000 0000  0000 0000  0000 0001  0110 0001  1110 0000  0010 0000  0001 100. = first_mb_in_slice: 2968522763
    .... ...0  0111 .... = slice_type: B (B slice) (6)
    .... 0001  110. .... = pic_parameter_set_id: 13

This part confuses me, when the camera is stationary, the encoder gives me smaller and smaller NALU with undefined types, and I'm not entirely sure why, anyways, the packet below gets sent as one whole NALU to FFMPEG.这部分让我感到困惑，当相机静止时，编码器给我提供了越来越小的未定义类型的 NALU，我不完全确定为什么，无论如何，下面的数据包作为一个完整的 NALU 被发送到 FFMPEG。

Real-Time Transport Protocol
10.. .... = Version: RFC 1889 Version (2)
..0. .... = Padding: False
...0 .... = Extension: False
.... 0000 = Contributing source identifiers count: 0
1... .... = Marker: True
Payload type: DynamicRTP-Type-96 (96)
Sequence number: 36
Timestamp: 2727180258
Synchronization Source identifier: 0x00000000 (0)
H.264
NAL unit header or first byte of the payload
    0... .... = F bit: No bit errors or other syntax violations
    .00. .... = Nal_ref_idc (NRI): 0
    ...0 0000 = Type: Undefined (0)
H264 NAL Unit Payload

I'm using Android MediaCodec encoder, and here is some code where I configure the encoder:我正在使用 Android MediaCodec 编码器，这里是一些我配置编码器的代码：

mediaCodec = MediaCodec.createByCodecName("OMX.Nvidia.h264.encoder");
mediaFormat = MediaFormat.createVideoFormat("video/avc", 1920, 1080);
mediaFormat.setInteger(MediaFormat.KEY_BIT_RATE, 125000);
mediaFormat.setInteger(MediaFormat.KEY_FRAME_RATE, 30);
mediaFormat.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface);
mediaFormat.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 0);
mediaFormat.setInteger(MediaFormat.KEY_MAX_INPUT_SIZE, 1920 * 1080);

Is the encoder giving me whole access units or only NALU?编码器是给我整个访问单元还是只给我 NALU？

Here is my logic:这是我的逻辑：

If the frame size is larger then the MTU, the frame will be fragmented.如果帧大小大于 MTU，则帧将被分段。
- When I send the start FU-A header, I set the start bit to 1.当我发送起始 FU-A 标头时，我将起始位设置为 1。
- When I send the last fragmented bytes of the frame, I set the marker bit in the RTP header to 1 and the end fragment bit in the FU-A header to 1.当我发送帧的最后一个分段字节时，我将 RTP 标头中的标记位设置为 1，将 FU-A 标头中的结束分段位设置为 1。
- FU-A headers between the start and end fragments have the start and end bit set to 0.开始和结束片段之间的 FU-A 标头的开始和结束位设置为 0。
- The marker is always set to 0 except for the last packet.除最后一个数据包外，标记始终设置为 0。
If the NALU can fit in the MTU, the whole frame is sent.如果 NALU 可以放入 MTU，则发送整个帧。
With each NALU sent, I iterate the sequence number for the RTP header.每次发送 NALU 后，我都会迭代 RTP 标头的序列号。
With each NALU sent, I get a new time stamp for the RTP header.每次发送 NALU 后，我都会为 RTP 标头获得一个新的时间戳。
Before I fragment a NALU, I save the NALU type and insert it in the FU-A header在对 NALU 进行分段之前，我保存了 NALU 类型并将其插入到 FU-A 标头中

I feel like I'm close, but its clearly not working for any RTP receivers.我觉得我很接近，但它显然不适用于任何 RTP 接收器。 I appreciate any thoughts or ideas on the matter.我感谢任何关于此事的想法或想法。

Thanks,谢谢，

Answer 1

I finally managed to work it out, my packets were not configured properly.我终于设法解决了，我的数据包配置不正确。

I must iterate the sequence number per packet.我必须迭代每个数据包的序列号。
I must set the time stamp per NALU instead of per packet.我必须为每个 NALU 而不是每个数据包设置时间戳。
I must strip the NALU prefix of 00 00 01 ** sending bytes after index 4.我必须去掉 00 00 01 ** 在索引 4 之后发送字节的 NALU 前缀。
The bitwise operations in my headers were incorrect.我的标题中的按位运算不正确。

I can even start FFmpeg in the middle of the stream and it works!我什至可以在流的中间启动 FFmpeg 并且它可以工作！

如何将Android MediaCodec编码的H264打包成RTP包

问题描述

1 个解决方案

解决方案1
1 2016-10-10 18:16:40

如何将Android MediaCodec编码的H264打包成RTP包

问题描述

1 个解决方案

解决方案1 1 2016-10-10 18:16:40

解决方案1
1 2016-10-10 18:16:40