使用AutoGen FFmpeg庫在MP4中同步音頻/視頻

Question

我目前在使音頻和視頻流保持同步方面遇到問題。

這些是我正在使用的AVCodecContext：

對於視頻：

AVCodec* videoCodec = ffmpeg.avcodec_find_encoder(AVCodecID.AV_CODEC_ID_H264)
AVCodecContext* videoCodecContext = ffmpeg.avcodec_alloc_context3(videoCodec);
videoCodecContext->bit_rate = 400000;
videoCodecContext->width = 1280;
videoCodecContext->height = 720;
videoCodecContext->gop_size = 12;
videoCodecContext->max_b_frames = 1;
videoCodecContext->pix_fmt = videoCodec->pix_fmts[0];
videoCodecContext->codec_id = videoCodec->id;
videoCodecContext->codec_type = videoCodec->type;
videoCodecContext->time_base = new AVRational
{
    num = 1,
    den = 30
};

對於音頻：

AVCodec* audioCodec = ffmpeg.avcodec_find_encoder(AVCodecID.AV_CODEC_ID_AAC)
AVCodecContext* audioCodecContext = ffmpeg.avcodec_alloc_context3(audioCodec);
audioCodecContext->bit_rate = 1280000;
audioCodecContext->sample_rate = 48000;
audioCodecContext->channels = 2;
audioCodecContext->channel_layout = ffmpeg.AV_CH_LAYOUT_STEREO;
audioCodecContext->frame_size = 1024;
audioCodecContext->sample_fmt = audioCodec->sample_fmts[0];
audioCodecContext->profile = ffmpeg.FF_PROFILE_AAC_LOW;
audioCodecContext->codec_id = audioCodec->id;
audioCodecContext->codec_type = audioCodec->type;

編寫視頻幀時，我將PTS位置設置如下：

outputFrame->pts = frameIndex;  // The current index of the image frame being written

然后，我使用avcodec_encode_video2（）對幀進行編碼。 此后，我調用以下命令來設置時間戳：

ffmpeg.av_packet_rescale_ts(&packet, videoCodecContext->time_base, videoStream->time_base);

這玩得很完美。

但是，當我對音頻執行相同的操作時，視頻將以慢動作播放，先播放音頻，然后再繼續播放視頻而沒有聲音。

我找不到如何在MP4文件中為視頻/音頻設置pts / dts位置的示例。 任何幫助的例子都很棒！

另外，我先寫視頻幀，然后（一旦全部寫完）我寫音頻。 我已經用注釋中建議的調整值更新了這個問題。

我上傳了一個測試視頻，以在此處顯示我的結果： http : //www.filedropper.com/test_124

Answer 1

PS：查看有關使用FFmpeg進行A / V同步的本文/教程。 如果以下內容不成功，則可能會對您有所幫助。

1）關於視頻和音頻時間戳...

而不是使用當前的frameIndex作為時間戳，然后在以后重新縮放它們。 如果可能，請跳過重新縮放。

然后，替代方法是確保首先使用視頻的每秒幀數（FPS）正確創建PTS值（在outputFrame->pts ）。 去做這個...

對於每個視頻幀 ： outputFrame->pts = (1000 / FPS) * frameIndex;
（對於30 FPS視頻，第1幀的時間為0，到第30幀的“時鍾”已達到1秒。
因此，現在1000/30為每個視頻幀提供33.333毫秒的顯示間隔。 當frameIndex為30時，我們可以說33.333 x 30 = 1000毫秒（或1秒，每秒確認30幀）。

對於每個音頻幀 ： outputFrame->pts = ((1024 / 48000) * 1000) * frameIndex;
（由於48khz AAC幀的持續時間為21.333 m.secs，因此時間戳增加了該時間量。公式為：（1024 PCM / SampleRate）x 1000 ms / perSec）然后乘以幀索引。）

2）關於音頻設置...

比特率：
audioCodecContext->bit_rate = 64000; 如果您的sample_rate為48000Hz，這似乎很奇怪（我假設您的位深是每個樣本16位？）。

嘗試將96000或128000作為最低起始值。

鏡框尺寸

int AVCodecContext::frame_size意思是“ 音頻幀中每個通道的樣本數” 。

考慮到以上對Docs的引用，並且MPEG AAC並不“按通道”執行（因為每個L / R通道的數據都包含在每個幀中）。 每個AAC幀保存1024個PCM樣本。

audioCodecContext->frame_size = 88200; 對於大小，您可以嘗試= 1024;

簡介：
我注意到您已將MAIN用於AAC配置文件。 我習慣在視頻中看到Low Complexity 。 我從硬盤上的各種來源嘗試了一些隨機的MP4文件，但使用“主要”配置文件找不到一個。 作為萬不得已的方法，測試“低復雜性”不會有任何問題。

嘗試使用audioCodecContext->profile = ffmpeg.FF_PROFILE_AAC_LOW;

PS：檢查是否存在AAC問題 （取決於您的FFmpeg版本）。

Answer 2

解決了問題。 在設置幀PTS位置后，我添加了一個新功能來設置視頻/音頻位置。

視頻只是通常的增量（每幀+1），而音頻按以下方式完成：

outputFrame->pts = ffmpeg.av_rescale_q(m_audioFrameSampleIncrement, new AVRational { num = 1, den = 48000 }, m_audioCodecContext->time_base);

m_audioFrameSampleIncrement += outputFrame->nb_samples;

幀編碼后，我調用新函數：

private static void SetPacketProperties(ref AVPacket packet, AVCodecContext* codecContext, AVStream* stream)
{
    packet.pts = ffmpeg.av_rescale_q_rnd(packet.pts, codecContext->time_base, stream->time_base, AVRounding.AV_ROUND_NEAR_INF | AVRounding.AV_ROUND_PASS_MINMAX);
    packet.dts = ffmpeg.av_rescale_q_rnd(packet.dts, codecContext->time_base, stream->time_base, AVRounding.AV_ROUND_NEAR_INF | AVRounding.AV_ROUND_PASS_MINMAX);
    packet.duration = (int)ffmpeg.av_rescale_q(packet.duration, codecContext->time_base, stream->time_base);
    packet.stream_index = stream->index;
}

使用AutoGen FFmpeg庫在MP4中同步音頻/視頻

問題描述

2 個解決方案

解決方案1
1 2016-07-06 18:01:20

解決方案2
1 已采納 2016-07-12 10:31:35

使用AutoGen FFmpeg庫在MP4中同步音頻/視頻

問題描述

2 個解決方案

解決方案1 1 2016-07-06 18:01:20

解決方案2 1 已采納 2016-07-12 10:31:35

解決方案1
1 2016-07-06 18:01:20

解決方案2
1 已采納 2016-07-12 10:31:35