简体   繁体   English

正确计算PTS和DTS以同步音频和视频ffmpeg C ++

[英]Compute PTS and DTS correctly to sync audio and video ffmpeg C++

I am trying to mux H264 encoded data and G711 PCM data into mov multimedia container. 我正在尝试将H264编码数据和G711 PCM数据复用到mov多媒体容器中。 I am creating AVPacket from encoded data and initially the PTS and DTS value of video/audio frames is equivalent to AV_NOPTS_VALUE . 我正在从编码数据创建AVPacket ,最初视频/音频帧的PTS和DTS值等同于AV_NOPTS_VALUE So I calculated the DTS using current time information. 所以我使用当前时间信息计算了DTS。 My code - 我的代码 -

bool AudioVideoRecorder::WriteVideo(const unsigned char *pData, size_t iDataSize, bool const bIFrame) {
    .....................................
    .....................................
    .....................................
    AVPacket pkt = {0};
    av_init_packet(&pkt);
    int64_t dts = av_gettime();
    dts = av_rescale_q(dts, (AVRational){1, 1000000}, m_pVideoStream->time_base);
    int duration = 90000 / VIDEO_FRAME_RATE;
    if(m_prevVideoDts > 0LL) {
        duration = dts - m_prevVideoDts;
    }
    m_prevVideoDts = dts;

    pkt.pts = AV_NOPTS_VALUE;
    pkt.dts = m_currVideoDts;
    m_currVideoDts += duration;
    pkt.duration = duration;
    if(bIFrame) {
        pkt.flags |= AV_PKT_FLAG_KEY;
    }
    pkt.stream_index = m_pVideoStream->index;
    pkt.data = (uint8_t*) pData;
    pkt.size = iDataSize;

    int ret = av_interleaved_write_frame(m_pFormatCtx, &pkt);

    if(ret < 0) {
        LogErr("Writing video frame failed.");
        return false;
    }

    Log("Writing video frame done.");

    av_free_packet(&pkt);
    return true;
}

bool AudioVideoRecorder::WriteAudio(const unsigned char *pEncodedData, size_t iDataSize) {
    .................................
    .................................
    .................................
    AVPacket pkt = {0};
    av_init_packet(&pkt);

    int64_t dts = av_gettime();
    dts = av_rescale_q(dts, (AVRational){1, 1000000}, (AVRational){1, 90000});
    int duration = AUDIO_STREAM_DURATION; // 20
    if(m_prevAudioDts > 0LL) {
        duration = dts - m_prevAudioDts;
    }
    m_prevAudioDts = dts;
    pkt.pts = AV_NOPTS_VALUE;
    pkt.dts = m_currAudioDts;
    m_currAudioDts += duration;
    pkt.duration = duration;

    pkt.stream_index = m_pAudioStream->index;
    pkt.flags |= AV_PKT_FLAG_KEY;
    pkt.data = (uint8_t*) pEncodedData;
    pkt.size = iDataSize;

    int ret = av_interleaved_write_frame(m_pFormatCtx, &pkt);
    if(ret < 0) {
        LogErr("Writing audio frame failed: %d", ret);
        return false;
    }

    Log("Writing audio frame done.");

    av_free_packet(&pkt);
    return true;
}

And I added stream like this - 我添加了这样的流 -

AVStream* AudioVideoRecorder::AddMediaStream(enum AVCodecID codecID) {
    ................................
    .................................   
    pStream = avformat_new_stream(m_pFormatCtx, codec);
    if (!pStream) {
        LogErr("Could not allocate stream.");
        return NULL;
    }
    pStream->id = m_pFormatCtx->nb_streams - 1;
    pCodecCtx = pStream->codec;
    pCodecCtx->codec_id = codecID;

    switch(codec->type) {
    case AVMEDIA_TYPE_VIDEO:
        pCodecCtx->bit_rate = VIDEO_BIT_RATE;
        pCodecCtx->width = PICTURE_WIDTH;
        pCodecCtx->height = PICTURE_HEIGHT;
        pStream->time_base = (AVRational){1, 90000};
        pStream->avg_frame_rate = (AVRational){90000, 1};
        pStream->r_frame_rate = (AVRational){90000, 1}; // though the frame rate is variable and around 15 fps
        pCodecCtx->pix_fmt = STREAM_PIX_FMT;
        m_pVideoStream = pStream;
        break;

    case AVMEDIA_TYPE_AUDIO:
        pCodecCtx->sample_fmt = AV_SAMPLE_FMT_S16;
        pCodecCtx->bit_rate = AUDIO_BIT_RATE;
        pCodecCtx->sample_rate = AUDIO_SAMPLE_RATE;
        pCodecCtx->channels = 1;
        m_pAudioStream = pStream;
        break;

    default:
        break;
    }

    /* Some formats want stream headers to be separate. */
    if (m_pOutputFmt->flags & AVFMT_GLOBALHEADER)
        m_pFormatCtx->flags |= CODEC_FLAG_GLOBAL_HEADER;

    return pStream;
}

There are several problems with this calculation: 这个计算有几个问题:

  1. The video is laggy and lags behind than audio increasingly with time. 随着时间的推移,视频比音频越来越滞后并且落后。

  2. Suppose, an audio frame is received ( WriteAudio(..) ) little lately like 3 seconds, then the late frame should be started playing with 3 second delay, but it's not. 假设,最近接收到一个音频帧( WriteAudio(..) ),就像3秒一样,然后应该以3秒延迟开始播放后期帧,但事实并非如此。 The delayed frame is played consecutively with previous frame. 延迟帧与前一帧连续播放。

  3. Sometimes I recorded for ~40 seconds but the file duration is much like 2 minutes, but audio/video is played only few moments like 40 seconds and rest of the file contains nothing and seekbar jumps at en immediately after 40 seconds (tested in VLC). 有时我录制了约40秒,但文件持续时间大约是2分钟,但音频/视频只播放了很短的时间,如40秒,文件的其余部分不包含任何内容,并且在40秒后立即跳转到en(在VLC中测试) 。

EDIT: 编辑:

According to Ronald S. Bultje's suggestion, what I've understand: 根据Ronald S. Bultje的建议,我明白了:

m_pAudioStream->time_base = (AVRational){1, 9000}; // actually no need to set as 9000 is already default value for audio as you said
m_pVideoStream->time_base = (AVRational){1, 9000};

should be set as now both audio and video streams are now in same time base units. 应设置为现在音频和视频流现在都是相同的时基单位。

And for video: 对于视频:

...................
...................

int64_t dts = av_gettime(); // get current time in microseconds
dts *= 9000; 
dts /= 1000000; // 1 second = 10^6 microseconds
pkt.pts = AV_NOPTS_VALUE; // is it okay?
pkt.dts = dts;
// and no need to set pkt.duration, right?

And for audio: (exactly same as video, right?) 对于音频:(与视频完全相同,对吧?)

...................
...................

int64_t dts = av_gettime(); // get current time in microseconds
dts *= 9000; 
dts /= 1000000; // 1 second = 10^6 microseconds
pkt.pts = AV_NOPTS_VALUE; // is it okay?
pkt.dts = dts;
// and no need to set pkt.duration, right?

And I think they are now like sharing same currDts , right? 我认为他们现在喜欢分享同样的currDts ,对吗? Please correct me if I am wrong anywhere or missing anything. 如果我在任何地方错了或遗失任何东西,请纠正我。

Also, if I want to use video stream time base as (AVRational){1, frameRate} and audio stream time base as (AVRational){1, sampleRate} , how the correct code should look like? 另外,如果我想将视频流时基用作(AVRational){1, frameRate}和音频流时基为(AVRational){1, sampleRate} ,那么正确的代码应该如何?

EDIT 2.0: 编辑2.0:

m_pAudioStream->time_base = (AVRational){1, VIDEO_FRAME_RATE};
m_pVideoStream->time_base = (AVRational){1, VIDEO_FRAME_RATE};

And

bool AudioVideoRecorder::WriteAudio(const unsigned char *pEncodedData, size_t iDataSize) {
    ...........................
    ......................
    AVPacket pkt = {0};
    av_init_packet(&pkt);

    int64_t dts = av_gettime() / 1000; // convert into millisecond
    dts = dts * VIDEO_FRAME_RATE;
    if(m_dtsOffset < 0) {
        m_dtsOffset = dts;
    }

    pkt.pts = AV_NOPTS_VALUE;
    pkt.dts = (dts - m_dtsOffset);

    pkt.stream_index = m_pAudioStream->index;
    pkt.flags |= AV_PKT_FLAG_KEY;
    pkt.data = (uint8_t*) pEncodedData;
    pkt.size = iDataSize;

    int ret = av_interleaved_write_frame(m_pFormatCtx, &pkt);
    if(ret < 0) {
        LogErr("Writing audio frame failed: %d", ret);
        return false;
    }

    Log("Writing audio frame done.");

    av_free_packet(&pkt);
    return true;
}

bool AudioVideoRecorder::WriteVideo(const unsigned char *pData, size_t iDataSize, bool const bIFrame) {
    ........................................
    .................................
    AVPacket pkt = {0};
    av_init_packet(&pkt);
    int64_t dts = av_gettime() / 1000;
    dts = dts * VIDEO_FRAME_RATE;
    if(m_dtsOffset < 0) {
        m_dtsOffset = dts;
    }
    pkt.pts = AV_NOPTS_VALUE;
    pkt.dts = (dts - m_dtsOffset);

    if(bIFrame) {
        pkt.flags |= AV_PKT_FLAG_KEY;
    }
    pkt.stream_index = m_pVideoStream->index;
    pkt.data = (uint8_t*) pData;
    pkt.size = iDataSize;

    int ret = av_interleaved_write_frame(m_pFormatCtx, &pkt);

    if(ret < 0) {
        LogErr("Writing video frame failed.");
        return false;
    }

    Log("Writing video frame done.");

    av_free_packet(&pkt);
    return true;
}

Is the last change okay? 最后一次改变还好吗? The video and audio seems synced. 视频和音频似乎已同步。 Only problem is - the audio is played without the delay regardless the packet arrived in delay. 唯一的问题是 - 无论数据包延迟到达,音频都会无延迟播放。 Like - 喜欢 -

packet arrival: 1 2 3 4... (then next frame arrived after 3 sec) .. 5 包到达:1 2 3 4 ...(然后下一帧在3秒后到达).. 5

audio played: 1 2 3 4 (no delay) 5 播放的音频:1 2 3 4(无延迟)5

EDIT 3.0: 编辑3.0:

zeroed audio sample data: 归零音频样本数据:

AVFrame* pSilentData;
pSilentData = av_frame_alloc();
memset(&pSilentData->data[0], 0, iDataSize);

pkt.data = (uint8_t*) pSilentData;
pkt.size = iDataSize;

av_freep(&pSilentData->data[0]);
av_frame_free(&pSilentData);

Is this okay? 这个可以吗? But after writing this into file container, there are dot dot noise during playing the media. 但在将其写入文件容器后,播放媒体时会出现点阵噪音。 Whats the problem? 有什么问题?

EDIT 4.0: 编辑4.0:

Well, For µ-Law audio the zero value is represented as 0xff . 那么, 对于μ-Law音频,零值表示为0xff So - 所以 -

memset(&pSilentData->data[0], 0xff, iDataSize);

solve my problem. 解决我的问题。

Timestamps (such as dts ) should be in AVStream.time_base units. 时间戳(例如dts )应该是AVStream.time_base单位。 You're requesting a video timebase of 1/90000 and a default audio timebase (1/9000), but you're using a timebase of 1/100000 to write dts values. 您正在请求1/90000的视频时基和默认音频时基(1/9000),但您使用1/100000的时基来编写dts值。 I'm also not sure if it's guaranteed that requested timebases are maintained during header writing, your muxer might change the values and expect you to deal with the new values. 我也不确定是否保证在标题写入期间维护请求的时基,您的复用器可能会更改值并期望您处理新值。

So code like this: 所以这样的代码:

 int64_t dts = av_gettime(); dts = av_rescale_q(dts, (AVRational){1, 1000000}, (AVRational){1, 90000}); int duration = AUDIO_STREAM_DURATION; // 20 if(m_prevAudioDts > 0LL) { duration = dts - m_prevAudioDts; } 

Won't work. 不行。 Change that to something that uses the audiostream's timebase, and don't set the duration unless you know what you're doing. 将其更改为使用audiostream的时基的内容,除非您知道自己在做什么,否则不要设置持续时间。 (Same for video.) (视频相同。)

 m_prevAudioDts = dts; pkt.pts = AV_NOPTS_VALUE; pkt.dts = m_currAudioDts; m_currAudioDts += duration; pkt.duration = duration; 

This looks creepy, especially combined with the video alike code. 这看起来令人毛骨悚然,特别是与视频相似的代码相结合。 The problem here is that the first packet for both will have a timestamp of zero, regardless of inter-packet delay between the streams. 这里的问题是,无论流之间的数据包间延迟如何,两者的第一个数据包都将具有零时间戳。 You need one parent currDts shared between all streams, otherwise your streams will be perpetually out of sync. 您需要在所有流之间共享一个父currDts,否则您的流将永远不同步。

[edit] [编辑]

So, regarding your edit, if you have audio gaps, I think you need to insert silence (zeroed audio sample data) for the duration of the gap. 所以,关于你的编辑,如果你有音频空白,我认为你需要在间隙期间插入静音(归零音频样本数据)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM