简体   繁体   中英

FFmpeg parse NALs from H264 bitstream

I'm able to use FFmpeg to encode a dummy frame into an H264 bitstream. What I'd additionally like to do is extract the individual NAL's from the bitstream.

From lots of hunting around it seems like using an AVParser and av_parser_parse2 is the way to do it? I can see the functionality is there in h264_parser.c I just can't work out how to hook it up. Although maybe AVparser only deals in frames and AVBitStreamFilter or something else is needed?

In my example below I am successfully encoding to H264 and transmitting the results over RTP. ffplay is able to receive and display the H264 RTP packets it receives so I'm confident the H264 encoding is working correctly.

#include <ctime>
#include <iomanip>
#include <iostream>
#include <string>
#include <sstream>

#include "strutils.h"

extern "C"
{
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libavformat/avio.h>
#include <libavutil/imgutils.h>
#include <libswscale/swscale.h>
#include <libavutil/time.h>
}

#define WIDTH 640
#define HEIGHT 480
#define FRAMES_PER_SECOND 30
#define RTP_OUTPUT_FORMAT "rtp"
#define RTP_URL "rtp://127.0.0.1:5024"
#define ERROR_LEN 128
#define codecID AVCodecID::AV_CODEC_ID_H264 // AVCodecID::AV_CODEC_ID_VP8;

SwsContext* _swsContext;
AVCodec* _codec;
AVCodecContext* _codecCtx;
AVFormatContext* _formatContext;
AVStream* _rtpOutStream;
char _errorLog[ERROR_LEN];
AVCodecParserContext* _codecParserCtx;

int main()
{
  std::cout << "FFmpeg Encoder and RTP Stream Test" << std::endl;

  av_log_set_level(AV_LOG_DEBUG);

  // Initialise codec context.
  _codec = avcodec_find_encoder(codecID);
  if (_codec == NULL) {
    throw std::runtime_error("Could not find codec for ID " + std::to_string(codecID) + ".");
  }

  _codecCtx = avcodec_alloc_context3(_codec);
  if (!_codecCtx) {
    std::cerr << "Failed to initialise codec context." << std::endl;;
  }

  _codecCtx->width = WIDTH;
  _codecCtx->height = HEIGHT;
  //_codecCtx->bit_rate = 500000;
  _codecCtx->time_base.den = FRAMES_PER_SECOND;
  _codecCtx->time_base.num = 1;
  //_codecCtx->gop_size = 10;
  //_codecCtx->max_b_frames = 1;
  _codecCtx->pix_fmt = AVPixelFormat::AV_PIX_FMT_YUV420P;

  int res = avcodec_open2(_codecCtx, _codec, NULL);
  if (res < 0) {
    std::cerr << "Failed to open codec: " << av_make_error_string(_errorLog, ERROR_LEN, res) << std::endl;
  }

  // Set up a parser to extract NAL's from the H264 bit stream.
  // Note this is not needed for sending the RTP (I need to separate the NALs for another reason).
  _codecParserCtx = av_parser_init(codecID);
  if (!_codecParserCtx) {
    std::cerr << "Failed to initialise codec parser." << std::endl;
  }

  // Initialise RTP output stream.
  AVOutputFormat* fmt = av_guess_format(RTP_OUTPUT_FORMAT, NULL, NULL);
  if (!fmt) {
    std::cerr << "Failed to guess output format for " << RTP_OUTPUT_FORMAT << "." << std::endl;
  }

  res = avformat_alloc_output_context2(&_formatContext, fmt, fmt->name, RTP_URL);
  if (res < 0) {
    std::cerr << "Failed to allocate output context: " << av_make_error_string(_errorLog, ERROR_LEN, res) << std::endl;
  }

  _rtpOutStream = avformat_new_stream(_formatContext, _codec);
  if (!_rtpOutStream) {
    std::cerr << "Failed to allocate output stream." << std::endl;
  }

  res = avio_open(&_formatContext->pb, _formatContext->url, AVIO_FLAG_WRITE);
  if (res < 0) {
    std::cerr << "Failed to open RTP output context for writing: " << av_make_error_string(_errorLog, ERROR_LEN, res) << std::endl;
  }

  res = avcodec_parameters_from_context(_rtpOutStream->codecpar, _codecCtx);
  if (res < 0) {
    std::cerr << "Failed to copy codec parameters to stream: " << av_make_error_string(_errorLog, ERROR_LEN, res) << std::endl;
  }

  res = avformat_write_header(_formatContext, NULL);
  if (res < 0) {
    std::cerr << "Failed to write output header: " << av_make_error_string(_errorLog, ERROR_LEN, res) << std::endl;
  }

  av_dump_format(_formatContext, 0, RTP_URL, 1);

  // Set a dummy frame with a YUV420 image.
  AVFrame* frame = av_frame_alloc();
  frame->format = AVPixelFormat::AV_PIX_FMT_YUV420P;
  frame->width = WIDTH;
  frame->height = HEIGHT;
  frame->pts = 0;

  res = av_frame_get_buffer(frame, 0);
  if (res < 0) {
    std::cerr << "Failed on av_frame_get_buffer: " << av_make_error_string(_errorLog, ERROR_LEN, res) << std::endl;
  }

  res = av_frame_make_writable(frame);
  if (res < 0) {
    std::cerr << "Failed on av_frame_make_writable: " << av_make_error_string(_errorLog, ERROR_LEN, res) << std::endl;
  }

  for (int y = 0; y < HEIGHT; y++) {
    for (int x = 0; x < WIDTH; x++) {
      frame->data[0][y * frame->linesize[0] + x] = x + y + 1 * 3;
    }
  }

  for (int y = 0; y < HEIGHT / 2; y++) {
    for (int x = 0; x < WIDTH / 2; x++) {
      frame->data[1][y * frame->linesize[1] + x] = 128 + y + 2;
      frame->data[2][y * frame->linesize[2] + x] = 64 + y + 5;
    }
  }

  std::cout << "press any key to start the stream..." << std::endl;
  getchar();

  // Start the loop to encode the static dummy frame and output on the RTP stream.
  AVPacket* pkt = av_packet_alloc();
  uint8_t* data{ nullptr };
  int dataSize;

  while (true) {
    int sendres = avcodec_send_frame(_codecCtx, frame);
    if (sendres != 0) {
      std::cerr << "avcodec_send_frame error: " << av_make_error_string(_errorLog, ERROR_LEN, sendres) << std::endl;
    }

    // Read encoded packets.
    int ret = 0;
    while (ret >= 0) {

      ret = avcodec_receive_packet(_codecCtx, pkt);

      if (ret == AVERROR(EAGAIN)) {
        // Encoder needs more data.
        break;
      }
      else if (ret < 0) {
        std::cerr << "Failed to encode frame: " << av_make_error_string(_errorLog, ERROR_LEN, sendres) << std::endl;
        break;
      }
      else {
        std::cout << "Encoded packet pts " << pkt->pts << ", size " << pkt->size << "." << std::endl;
        std::cout << toHex(pkt->data, pkt->data + pkt->size) << std::endl;

        int pktOffset = 0;

        // TODO: Find a way to separate the NALs from the Annex B H264 byte stream in the AVPacket data.
        //AVBitStreamFilter 
        
        while (pkt->size > pktOffset) {
          int bytesRead = av_parser_parse2(_codecParserCtx, _codecCtx, &data, &dataSize, pkt->data + pktOffset, pkt->size - pktOffset, AV_NOPTS_VALUE, AV_NOPTS_VALUE, 0);

          if (bytesRead == 0) {
            std::cout << "Failed to parse data from packet." << std::endl;
            break;
          }
          else if (bytesRead < 0) {
            std::cerr << "av_parser_parse2 error: " << av_make_error_string(_errorLog, ERROR_LEN, bytesRead) << std::endl;
            break;
          }
          else {
            std::cout << "Codec parser bytes read " << bytesRead << "." << std::endl;
            pktOffset += bytesRead;
          }
        }
      }

      // Write the encoded packet to the RTP stream.
      int sendRes = av_write_frame(_formatContext, pkt);
      if (sendRes < 0) {
        std::cerr << "Failed to write frame to output stream: " << av_make_error_string(_errorLog, ERROR_LEN, sendres) << std::endl;
        break;
      }

      std::cout << "press any key to continue..." << std::endl;
      getchar();
    }

    av_usleep(1000000 / FRAMES_PER_SECOND);

    frame->pts++;
  }

  av_packet_free(&pkt);
  av_frame_free(&frame);
  avcodec_close(_codecCtx);
  avcodec_free_context(&_codecCtx);
  avformat_free_context(_formatContext);

  return 0;
}

The output from the first available frame that contains 4 separate NALs is below (apologies for the size). Since the H264 byte stream is using an Annex B format the NALs can be easily extracted, the delimiter being either 00000001 or 000001 . If possible I'd rather use the proper FFmpeg way of parsing instead of re-inventing the wheel.

FFmpegCppEncodingTest\x64\Debug>FFmpegCppEncodingTest.exe
FFmpeg Encoder and RTP Stream Test
[libx264 @ 00000252de7e2180] using mv_range_thread = 24
[libx264 @ 00000252de7e2180] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 00000252de7e2180] profile High, level 3.0, 4:2:0, 8-bit
[rtp @ 00000252df9d4c80] No default whitelist set
[udp @ 00000252df9d46c0] No default whitelist set
[udp @ 00000252df501440] No default whitelist set
Output #0, rtp, to 'rtp://127.0.0.1:5024':
  Metadata:
    encoder         : Lavf58.49.100
    Stream #0:0, 0, 1/90000: Video: h264 (libx264), 1 reference frame, yuv420p, 640x480 (0x0), 0/1, q=-1--1, 90k tbn
press any key to start the stream...

[libx264 @ 00000252de7e2180] frame=   0 QP=23.20 NAL=3 Slice:I Poc:0   I:1200 P:0    SKIP:0    size=8262 bytes
Encoded packet pts 0, size 8262.

Codec parser bytes read 8262, data size 0.
nal: .
[rtp @ 00000252df4ef8c0] Sending NAL 7 of len 24 M=0
[rtp @ 00000252df4ef8c0] Sending NAL 8 of len 6 M=0
[rtp @ 00000252df4ef8c0] Sending NAL 6 of len 673 M=0
[rtp @ 00000252df4ef8c0] Sending NAL 5 of len 7545 M=1
[rtp @ 00000252df4ef8c0] NAL size 7545 > 1460
press any key to continue...

[libx264 @ 00000252de7e2180] frame=   1 QP=23.21 NAL=2 Slice:P Poc:8   I:52   P:101  SKIP:1047 size=482 bytes
Encoded packet pts 4, size 482.
00000001419a246c45fffa5d37ecc1d1a2448b600952115b6c1b9880c2721414b9ad381385c2d2db0c0fc3041714814ca3fbfc1c85bb5cf888b1442cdf4fd78bed0ff15512df4949c046d5ed117d4ca9c2fe8e16d8b2a0ee8e8ca9ef07d709242427eec2e62e7c5ddd87a9cc7c7fa3c97bcf657971f49b92b8be0b5ec4d9de8d8abe9aa061abf1d193ca02fe38a4c37e5ca55fac90c7e3d20a050d0684cf50614872855915c4e51caffc4e16e50cfee7c2f92c574efd752e2493c2cb07447541446f498625f89c0396f244bd0674dac31a45e98cbdd7f1f447bf8c84b2c288e3693bcfc2c1a4c0789fdce4fa71181f99a2911c044284c0e9c801e9e7417fc7a65a6f02f8482bd969a8776ecbfff27f823c294ed9e28c56c1816d0f40d2c009f83beb246f5f22e39375fae6db239e9d560e8370f61653ec068631bfe84c2ba6376d1435ca231555a828d724ac0a38fc7986b92997c1a18940bc569d2c652b836b6d368c84ff7ebee187f31f84e6289aa7987ffe660ea59897174f5266bbb471b3ec50070d29b08ca8c92b8c2987da5e80448e99667627e55996a00c56753f9fc65fa75d742e5e15d89ecb007496045027a101244ea4f27792ef3210023196008043fa7e1ca05aa3b1e4a8a6ac5e384440cb5d11d9ec2d1117473875947c2f1aacc37c
Failed to parse data from packet.
[rtp @ 00000252df4ef8c0] Sending NAL 1 of len 478 M=1
press any key to continue...

[libx264 @ 00000252de7e2180] frame=   2 QP=26.00 NAL=2 Slice:B Poc:4   I:0    P:102  SKIP:1098 size=91 bytes
Encoded packet pts 2, size 91.
00000001419e42789bff42138e8cab7ce34f0aaf2f3fb0c41aac77dad7803c8a422c3668a09d337695ffad27dd3d2a1499cf8812c8873f3308741b44759e97059270a4f8678646dfa543ae4da163dacc33a85b2694e7e3c052a861
Codec parser bytes read 91, data size 0.
nal: .
[rtp @ 00000252df4ef8c0] Sending NAL 1 of len 87 M=1
press any key to continue...

[libx264 @ 00000252de7e2180] frame=   3 QP=28.00 NAL=0 Slice:B Poc:2   I:0    P:74   SKIP:1126 size=82 bytes
Encoded packet pts 1, size 82.
00000001019e6174457f4a9778ce66461da66e887d240ad470ec49fe325654c49141af33481787c812ab8d6e27331c0203d4fe099ef254623da56868fdac9a5e5f4e08ec8ef08390748186902972dbe37080
Failed to parse data from packet.
[rtp @ 00000252df4ef8c0] Sending NAL 1 of len 78 M=1
press any key to continue...

Since you are interested in finding NAL units which are delimited by start codes - you could use find_start_code https://www.ffmpeg.org/doxygen/trunk/h264dec_8h_source.html#l00822 which is not exported. So I would probably copy the ffmpeg code into your project.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM