简体   繁体   English

mp4文件使用mediacodec和mediamuxer时的音频和视频轨道同步问题

[英]audio and video track synchronization issue when using mediacodec and mediamuxer for mp4 files

I would like to produce mp4 file by multiplexing audio from mic (overwrite didGetAudioData) and video from camera (overwrite onpreviewframe).However, I encountered the sound and video synchronization problem, video will appear faster than audio. 我想通过多路复用来自mic的音频(覆盖didGetAudioData)和来自摄像机的视频(覆盖onpreviewframe)来产生mp4文件。但是,我遇到了声音和视频同步问题,所以视频的显示速度要比音频快。 I wondered if the problem related to incompatible configurations or presentationTimeUs, could someone guide me how to fix the problem. 我想知道问题是否与不兼容的配置或presentationTimeUs有关,有人可以指导我如何解决此问题。 Below were my software. 以下是我的软件。

Video configuration 视频配置

formatVideo = MediaFormat.createVideoFormat(MIME_TYPE_VIDEO, 640, 360);
formatVideo.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatYUV420SemiPlanar);
formatVideo.setInteger(MediaFormat.KEY_BIT_RATE, 2000000);
formatVideo.setInteger(MediaFormat.KEY_FRAME_RATE, 30);
formatVideo.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 5);

got video presentationPTS as below, 收到了如下的视频演示PTS,

if(generateIndex == 0) {
    videoAbsolutePtsUs = 132;
    StartVideoAbsolutePtsUs = System.nanoTime() / 1000L;
}else {
    CurrentVideoAbsolutePtsUs = System.nanoTime() / 1000L;
    videoAbsolutePtsUs =132+ CurrentVideoAbsolutePtsUs-StartVideoAbsolutePtsUs;
}
generateIndex++;

audio configuration 音频配置

format = MediaFormat.createAudioFormat(MIME_TYPE, 48000/*sample rate*/, AudioFormat.CHANNEL_IN_MONO /*Channel config*/);
format.setInteger(MediaFormat.KEY_AAC_PROFILE, MediaCodecInfo.CodecProfileLevel.AACObjectLC);
format.setInteger(MediaFormat.KEY_SAMPLE_RATE,48000);
format.setInteger(MediaFormat.KEY_CHANNEL_COUNT,1);
format.setInteger(MediaFormat.KEY_BIT_RATE,64000);

got audio presentationPTS as below, 得到了如下的音频演示PTS,

if(generateIndex == 0) {
   audioAbsolutePtsUs = 132;
   StartAudioAbsolutePtsUs = System.nanoTime() / 1000L;
}else {
   CurrentAudioAbsolutePtsUs = System.nanoTime() / 1000L;
   audioAbsolutePtsUs =CurrentAudioAbsolutePtsUs - StartAudioAbsolutePtsUs;
}

generateIndex++;
audioAbsolutePtsUs = getJitterFreePTS(audioAbsolutePtsUs, audioInputLength / 2);

long startPTS = 0;
long totalSamplesNum = 0;
private long getJitterFreePTS(long bufferPts, long bufferSamplesNum) {
    long correctedPts = 0;
    long bufferDuration = (1000000 * bufferSamplesNum) / 48000;
    bufferPts -= bufferDuration; // accounts for the delay of acquiring the audio buffer
    if (totalSamplesNum == 0) {
        // reset
        startPTS = bufferPts;
        totalSamplesNum = 0;
    }
    correctedPts = startPTS +  (1000000 * totalSamplesNum) / 48000;
    if(bufferPts - correctedPts >= 2*bufferDuration) {
        // reset
        startPTS = bufferPts;
        totalSamplesNum = 0;
        correctedPts = startPTS;
    }
    totalSamplesNum += bufferSamplesNum;
    return correctedPts;
}

Was my issue caused by applying jitter function for audio only? 我的问题是仅对音频应用抖动功能引起的吗? If yes, how could I apply jitter function for video? 如果是,如何为视频应用抖动功能? I also tried to find correct audio and video presentationPTS by https://android.googlesource.com/platform/cts/+/jb-mr2-release/tests/tests/media/src/android/media/cts/EncodeDecodeTest.java . 我还尝试通过https://android.googlesource.com/platform/cts/+/jb-mr2-release/tests/tests/media/src/android/media/cts/EncodeDecodeTest.java找到正确的音频和视频演示PTS But encodedecodeTest only provided video PTS. 但是encodeecodeTest仅提供视频PTS。 That's the reason my implementation used system nanotime for both audio and video. 这就是我的实现对音频和视频使用系统纳米时间的原因。 If I want to use video presentationPTS in encodedecodetest, how to construct the compatible audio presentationPTS? 如果要在encodedecodetest中使用视频presentationPTS,如何构造兼容的音频presentationPTS? Thanks for help! 感谢帮助!

below are how i queue yuv frame to video mediacodec for reference. 以下是我如何将yuv帧排队到视频mediacodec以供参考。 For audio part, it is identical except for different presentationPTS. 对于音频部分,除了不同的presentationPTS外,其余部分相同。

int videoInputBufferIndex;
int videoInputLength;
long videoAbsolutePtsUs;
long StartVideoAbsolutePtsUs, CurrentVideoAbsolutePtsUs;

int put_v =0;
int get_v =0;
int generateIndex = 0;

public void setByteBufferVideo(byte[] buffer, boolean isUsingFrontCamera, boolean Input_endOfStream){
    if(Build.VERSION.SDK_INT >=18){
        try{

            endOfStream = Input_endOfStream;
            if(!Input_endOfStream){
            ByteBuffer[] inputBuffers = mVideoCodec.getInputBuffers();
            videoInputBufferIndex = mVideoCodec.dequeueInputBuffer(-1);

                if (VERBOSE) {
                    Log.w(TAG,"[put_v]:"+(put_v)+"; videoInputBufferIndex = "+videoInputBufferIndex+"; endOfStream = "+endOfStream);
                }

                if(videoInputBufferIndex>=0) {
                    ByteBuffer inputBuffer = inputBuffers[videoInputBufferIndex];
                    inputBuffer.clear();

                    inputBuffer.put(mNV21Convertor.convert(buffer));
                    videoInputLength = buffer.length;

                    if(generateIndex == 0) {
                        videoAbsolutePtsUs = 132;
                        StartVideoAbsolutePtsUs = System.nanoTime() / 1000L;
                    }else {
                        CurrentVideoAbsolutePtsUs = System.nanoTime() / 1000L;
                        videoAbsolutePtsUs =132+ CurrentVideoAbsolutePtsUs - StartVideoAbsolutePtsUs;
                    }

                    generateIndex++;

                    if (VERBOSE) {
                        Log.w(TAG, "[put_v]:"+(put_v)+"; videoAbsolutePtsUs = " + videoAbsolutePtsUs + "; CurrentVideoAbsolutePtsUs = "+CurrentVideoAbsolutePtsUs);
                    }

                    if (videoInputLength == AudioRecord.ERROR_INVALID_OPERATION) {
                        Log.w(TAG, "[put_v]ERROR_INVALID_OPERATION");
                    } else if (videoInputLength == AudioRecord.ERROR_BAD_VALUE) {
                        Log.w(TAG, "[put_v]ERROR_ERROR_BAD_VALUE");
                    }
                    if (endOfStream) {
                        Log.w(TAG, "[put_v]:"+(put_v++)+"; [get] receive endOfStream");
                        mVideoCodec.queueInputBuffer(videoInputBufferIndex, 0, videoInputLength, videoAbsolutePtsUs, MediaCodec.BUFFER_FLAG_END_OF_STREAM);
                    } else {
                        Log.w(TAG, "[put_v]:"+(put_v++)+"; receive videoInputLength :" + videoInputLength);
                        mVideoCodec.queueInputBuffer(videoInputBufferIndex, 0, videoInputLength, videoAbsolutePtsUs, 0);
                    }
                }
            }
        }catch (Exception x) {
            x.printStackTrace();
        }
    }
}

How I solved this in my application was by setting the PTS of all video and audio frames against a shared "sync clock" (note the sync also means it's thread-safe) that starts when the first video frame (having a PTS 0 on its own) is available. 我如何在应用程序中解决此问题的方法是,将所有视频和音频帧的PTS设置为使用共享的“同步时钟”(注意同步也意味着它是线程安全的),该时间从第一个视频帧(其PTS为0)开始自己的)可用。 So if audio recording starts sooner than video, audio data is dismissed (doesn't go into encoder) until video starts, and if it starts later, then the first audio PTS will be relative to the start of the entire video. 因此,如果音频录制的开始时间早于视频,则音频数据将一直被丢弃(不会进入编码器),直到视频开始播放为止;如果音频录制的开始时间较晚,则第一个音频PTS将相对于整个视频的开始位置。

Ofcourse you are free to allow audio to start first, but players will usually skip or wait for the first video frame anyway. 当然,您可以自由地让音频首先开始,但是无论如何,播放器通常会跳过或等待第一个视频帧。 Also be careful that encoded audio frames will arrive "out of order" and MediaMuxer will fail with an error sooner or later. 还要注意,编码的音频帧将“乱序”到达,MediaMuxer迟早会因错误而失败。 My solution was to queue them all like this: sort them by pts when a new one comes in, then write everything that is older than 500 ms (relative to the newest one) to MediaMuxer, but only those with a PTS higher than the latest written frame. 我的解决方案是像这样对所有队列进行排队:当有新的队列时,按pt对它们进行排序,然后将所有比500ms(相对于最新的)早的事物写入MediaMuxer,但仅将PTS高于最新的事物写入书面框架。 Ideally this means data is smoothly written to MediaMuxer, with a 500 ms delay. 理想情况下,这意味着将数据平滑地写入MediaMuxer,延迟为500 ms。 Worst case, you will lose a few audio frames. 最坏的情况是,您将丢失一些音频帧。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM