简体   繁体   English

FFmpeg + OpenAL - 从视频播放流式传输声音不起作用

[英]FFmpeg + OpenAL - playback streaming sound from video won't work

I am decoding an OGG video (theora & vorbis as codecs) and want to show it on the screen (using Ogre 3D) while playing its sound. 我正在解码一个OGG视频(theora&vorbis作为编解码器),并希望在播放声音时在屏幕上显示它(使用Ogre 3D)。 I can decode the image stream just fine and the video plays perfectly with the correct frame rate, etc. 我可以很好地解码图像流,并且视频以正确的帧速率等完美播放。

However, I cannot get the sound to play at all with OpenAL. 但是,我无法通过OpenAL获得声音。

Edit: I managed to make the playing sound resemble the actual audio in the video at least somewhat. 编辑:我设法使播放声音至少在某种程度上与视频中的实际音频相似。 Updated sample code. 更新了示例代码。

Edit 2: I was able to get "almost" correct sound now. 编辑2:我现在能够获得“几乎”正确的声音。 I had to set OpenAL to use AL_FORMAT_STEREO_FLOAT32 (after initializing the extension) instead of just STEREO16. 我必须将OpenAL设置为使用AL_FORMAT_STEREO_FLOAT32(初始化扩展后)而不是仅使用STEREO16。 Now the sound is "only" extremely high pitched and stuttering, but at the correct speed. 现在声音“只”极高音调和口吃,但速度正确。

Here is how I decode audio packets (in a background thread, the equivalent works just fine for the image stream of the video file): 以下是我解码音频数据包的方法(在后台线程中,对于视频文件的图像流,等效工作正常):

//------------------------------------------------------------------------------
int decodeAudioPacket(  AVPacket& p_packet, AVCodecContext* p_audioCodecContext, AVFrame* p_frame,
                        FFmpegVideoPlayer* p_player, VideoInfo& p_videoInfo)
{
    // Decode audio frame
    int got_frame = 0;
    int decoded = avcodec_decode_audio4(p_audioCodecContext, p_frame, &got_frame, &p_packet);
    if (decoded < 0) 
    {
        p_videoInfo.error = "Error decoding audio frame.";
        return decoded;
    }

    // Frame is complete, store it in audio frame queue
    if (got_frame)
    {
        int bufferSize = av_samples_get_buffer_size(NULL, p_audioCodecContext->channels, p_frame->nb_samples, 
                                                    p_audioCodecContext->sample_fmt, 0);

        int64_t duration = p_frame->pkt_duration;
        int64_t dts = p_frame->pkt_dts;

        if (staticOgreLog)
        {
            staticOgreLog->logMessage("Audio frame bufferSize / duration / dts: " 
                    + boost::lexical_cast<std::string>(bufferSize) + " / "
                    + boost::lexical_cast<std::string>(duration) + " / "
                    + boost::lexical_cast<std::string>(dts), Ogre::LML_NORMAL);
        }

        // Create the audio frame
        AudioFrame* frame = new AudioFrame();
        frame->dataSize = bufferSize;
        frame->data = new uint8_t[bufferSize];
        if (p_frame->channels == 2)
        {
            memcpy(frame->data, p_frame->data[0], bufferSize >> 1);
            memcpy(frame->data + (bufferSize >> 1), p_frame->data[1], bufferSize >> 1);
        }
        else
        {
            memcpy(frame->data, p_frame->data, bufferSize);
        }
        double timeBase = ((double)p_audioCodecContext->time_base.num) / (double)p_audioCodecContext->time_base.den;
        frame->lifeTime = duration * timeBase;

        p_player->addAudioFrame(frame);
    }

    return decoded;
}

So, as you can see, I decode the frame, memcpy it to my own struct, AudioFrame. 所以,正如你所看到的,我解码了帧,将它memcpy到我自己的struct,AudioFrame。 Now, when the sound is played, I use these audio frame like this: 现在,当播放声音时,我使用这样的音频帧:

    int numBuffers = 4;
    ALuint buffers[4];
    alGenBuffers(numBuffers, buffers);
    ALenum success = alGetError();
    if(success != AL_NO_ERROR)
    {
        CONSOLE_LOG("Error on alGenBuffers : " + Ogre::StringConverter::toString(success) + alGetString(success));
        return;
    }

    // Fill a number of data buffers with audio from the stream
    std::vector<AudioFrame*> audioBuffers;
    std::vector<unsigned int> audioBufferSizes;
    unsigned int numReturned = FFMPEG_PLAYER->getDecodedAudioFrames(numBuffers, audioBuffers, audioBufferSizes);

    // Assign the data buffers to the OpenAL buffers
    for (unsigned int i = 0; i < numReturned; ++i)
    {
        alBufferData(buffers[i], _streamingFormat, audioBuffers[i]->data, audioBufferSizes[i], _streamingFrequency);

        success = alGetError();
        if(success != AL_NO_ERROR)
        {
            CONSOLE_LOG("Error on alBufferData : " + Ogre::StringConverter::toString(success) + alGetString(success)
                            + " size: " + Ogre::StringConverter::toString(audioBufferSizes[i]));
            return;
        }
    }

    // Queue the buffers into OpenAL
    alSourceQueueBuffers(_source, numReturned, buffers);
    success = alGetError();
    if(success != AL_NO_ERROR)
    {
        CONSOLE_LOG("Error queuing streaming buffers: " + Ogre::StringConverter::toString(success) + alGetString(success));
        return;
    }
}

alSourcePlay(_source);

The format and frequency I give to OpenAL are AL_FORMAT_STEREO_FLOAT32 (it is a stereo sound stream, and I did initialize the FLOAT32 extension) and 48000 (which is the sample rate of the AVCodecContext of the audio stream). 我给OpenAL的格式和频率是AL_FORMAT_STEREO_FLOAT32(它是立体声声音流,我确实初始化了FLOAT32扩展)和48000(这是音频流的AVCodecContext的采样率)。

And during playback, I do the following to refill OpenAL's buffers: 在播放过程中,我会执行以下操作来重新填充OpenAL的缓冲区:

ALint numBuffersProcessed;

// Check if OpenAL is done with any of the queued buffers
alGetSourcei(_source, AL_BUFFERS_PROCESSED, &numBuffersProcessed);
if(numBuffersProcessed <= 0)
    return;

// Fill a number of data buffers with audio from the stream
std::vector<AudiFrame*> audioBuffers;
std::vector<unsigned int> audioBufferSizes;
unsigned int numFilled = FFMPEG_PLAYER->getDecodedAudioFrames(numBuffersProcessed, audioBuffers, audioBufferSizes);

// Assign the data buffers to the OpenAL buffers
ALuint buffer;
for (unsigned int i = 0; i < numFilled; ++i)
{
    // Pop the oldest queued buffer from the source, 
    // fill it with the new data, then re-queue it
    alSourceUnqueueBuffers(_source, 1, &buffer);

    ALenum success = alGetError();
    if(success != AL_NO_ERROR)
    {
        CONSOLE_LOG("Error Unqueuing streaming buffers: " + Ogre::StringConverter::toString(success));
        return;
    }

    alBufferData(buffer, _streamingFormat, audioBuffers[i]->data, audioBufferSizes[i], _streamingFrequency);

    success = alGetError();
    if(success != AL_NO_ERROR)
    {
        CONSOLE_LOG("Error on re- alBufferData: " + Ogre::StringConverter::toString(success));
        return;
    }

    alSourceQueueBuffers(_source, 1, &buffer);

    success = alGetError();
    if(success != AL_NO_ERROR)
    {
        CONSOLE_LOG("Error re-queuing streaming buffers: " + Ogre::StringConverter::toString(success) + " "
                    + alGetString(success));
        return;
    }
}

// Make sure the source is still playing, 
// and restart it if needed.
ALint playStatus;
alGetSourcei(_source, AL_SOURCE_STATE, &playStatus);
if(playStatus != AL_PLAYING)
    alSourcePlay(_source);

As you can see, I do quite heavy error checking. 如你所见,我做了很多错误检查。 But I do not get any errors, neither from OpenAL nor from FFmpeg. 但我不会从OpenAL或FFmpeg那里得到任何错误。 Edit: What I hear somewhat resembles the actual audio from the video, but VERY high pitched and stuttering VERY much. 编辑:我听到的有点类似于视频中的实际音频,但非常高音调和口吃很多。 Also, it seems to be playing on top of TV noise. 此外,它似乎是在电视噪音的顶部播放。 Very strange. 很奇怪。 Plus, it is playing much slower than the correct audio would. 另外,它的播放速度比正确的音频慢得多。 Edit: 2 After using AL_FORMAT_STEREO_FLOAT32, the sound plays at the correct speed, but is still very high pitched and stuttering (though less than before). 编辑:2使用AL_FORMAT_STEREO_FLOAT32后,声音以正确的速度播放,但仍然是非常高的音调和口吃(虽然比以前少)。

The video itself is not broken, it can be played fine on any player. 视频本身没有被破坏,它可以在任何播放器上播放。 OpenAL can also play *.way files just fine in the same application, so it is also working. OpenAL也可以在同一个应用程序中播放* .way文件,所以它也可以正常工作。

Any ideas what could be wrong here or how to do this correctly? 任何想法在这里可能是错误的或如何正确地做到这一点?

My only guess is that somehow, FFmpeg's decode function does not produce data OpenGL can read. 我唯一的猜测是,不知何故,FFmpeg的解码函数不会产生OpenGL可以读取的数据。 But this is as far as the FFmpeg decode example goes, so I don't know what's missing. 但这就是FFmpeg解码示例,所以我不知道缺少什么。 As I understand it, the decode_audio4 function decodes the frame to raw data. 据我了解,decode_audio4函数将帧解码为原始数据。 And OpenAL should be able to work with RAW data (or rather, doesn't work with anything else). OpenAL应该能够使用RAW数据(或者更确切地说,不能与其他任何数据一起使用)。

So, I finally figured out how to do it. 所以,我终于想出了如何做到这一点。 Gee, what a mess. 哎呀,真是一团糟。 It was a hint from a user on the libav-users mailing list that put me on the correct path. 这是来自libav-users邮件列表上的用户的提示 ,它使我走上了正确的道路。

Here are my mistakes: 这是我的错误:

  1. Using the wrong format in the alBufferData function. 在alBufferData函数中使用错误的格式。 I used AL_FORMAT_STEREO16 (as that is what every single streaming example with OpenAL uses). 我使用了AL_FORMAT_STEREO16(因为这是OpenAL使用的每个流媒体示例)。 I should have used AL_FORMAT_STEREO_FLOAT32, as the video I stream is Ogg and vorbis is stored in floating points. 我应该使用AL_FORMAT_STEREO_FLOAT32,因为视频I流是Ogg而vorbis存储在浮点中。 And using swr_convert to convert from AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_S16 just crashes. 使用swr_convert从AV_SAMPLE_FMT_FLTP转换为AV_SAMPLE_FMT_S16只会崩溃。 No idea why. 不知道为什么。

  2. Not using swr_convert to convert the decoded audio frame to the target format. 不使用swr_convert将解码的音频帧转换为目标格式。 After I was trying to use swr_convert to convert from FLTP to S16, and it would simply crash without a reason given, I assumed it was broken. 在我尝试使用swr_convert从FLTP转换为S16之后,它会在没有给出理由的情况下崩溃,我认为它被破坏了。 But after figuring out my first mistake, I tried again, converting from FLTP to FLT (non-planar) and then it worked! 但在弄清楚我的第一个错误之后,我再次尝试,从FLTP转换为FLT(非平面)然后它工作了! So OpenAL uses interleaved format, not planar. 因此OpenAL使用交错格式,而不是平面格式。 Good to know. 很高兴知道。

So here is the decodeAudioPacket function that is working for me with Ogg video, vorbis audio stream: 所以这里有一个decodeAudioPacket函数,它适用于Ogg视频,vorbis音频流:

int decodeAudioPacket(  AVPacket& p_packet, AVCodecContext* p_audioCodecContext, AVFrame* p_frame,
                        SwrContext* p_swrContext, uint8_t** p_destBuffer, int p_destLinesize,
                        FFmpegVideoPlayer* p_player, VideoInfo& p_videoInfo)
{
    // Decode audio frame
    int got_frame = 0;
    int decoded = avcodec_decode_audio4(p_audioCodecContext, p_frame, &got_frame, &p_packet);
    if (decoded < 0) 
    {
        p_videoInfo.error = "Error decoding audio frame.";
        return decoded;
    }

    if(decoded <= p_packet.size)
    {
        /* Move the unread data to the front and clear the end bits */
        int remaining = p_packet.size - decoded;
        memmove(p_packet.data, &p_packet.data[decoded], remaining);
        av_shrink_packet(&p_packet, remaining);
    }

    // Frame is complete, store it in audio frame queue
    if (got_frame)
    {
        int outputSamples = swr_convert(p_swrContext, 
                                        p_destBuffer, p_destLinesize, 
                                        (const uint8_t**)p_frame->extended_data, p_frame->nb_samples);

        int bufferSize = av_get_bytes_per_sample(AV_SAMPLE_FMT_FLT) * p_videoInfo.audioNumChannels
                            * outputSamples;

        int64_t duration = p_frame->pkt_duration;
        int64_t dts = p_frame->pkt_dts;

        if (staticOgreLog)
        {
            staticOgreLog->logMessage("Audio frame bufferSize / duration / dts: " 
                    + boost::lexical_cast<std::string>(bufferSize) + " / "
                    + boost::lexical_cast<std::string>(duration) + " / "
                    + boost::lexical_cast<std::string>(dts), Ogre::LML_NORMAL);
        }

        // Create the audio frame
        AudioFrame* frame = new AudioFrame();
        frame->dataSize = bufferSize;
        frame->data = new uint8_t[bufferSize];
        memcpy(frame->data, p_destBuffer[0], bufferSize);
        double timeBase = ((double)p_audioCodecContext->time_base.num) / (double)p_audioCodecContext->time_base.den;
        frame->lifeTime = duration * timeBase;

        p_player->addAudioFrame(frame);
    }

    return decoded;
}

And here is how I initialize the context and the destination buffer: 这是我如何初始化上下文和目标缓冲区:

// Initialize SWR context
SwrContext* swrContext = swr_alloc_set_opts(NULL, 
            audioCodecContext->channel_layout, AV_SAMPLE_FMT_FLT, audioCodecContext->sample_rate,
            audioCodecContext->channel_layout, audioCodecContext->sample_fmt, audioCodecContext->sample_rate, 
            0, NULL);
int result = swr_init(swrContext);

// Create destination sample buffer
uint8_t** destBuffer = NULL;
int destBufferLinesize;
av_samples_alloc_array_and_samples( &destBuffer,
                                    &destBufferLinesize,
                                    videoInfo.audioNumChannels,
                                    2048,
                                    AV_SAMPLE_FMT_FLT,
                                    0);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM