Media Foundation音频/视频捕获到MPEG4FileSink会产生错误的持续时间

Question

I am working on media streaming application using Media Foundation framework. 我正在使用Media Foundation框架开发媒体流应用程序。 I've used some samples from internet and from Anton Polinger book. 我使用了互联网和安东·波林格（Anton Polinger）书中的一些样本。 Unfortunately after saving streams into mp4 file metadata of file is corrupted. 不幸的是，将流保存到mp4文件后，文件的元数据已损坏。 It has incorrect duration (according to time of work of my PC, 30 hours for instance), wrong bitrate. 持续时间不正确（例如，根据我的PC的工作时间，例如30小时），比特率错误。 After long struggling I've fixed it for single stream (video or audio) but when i try to record both audio and video this problem returns again. 经过长时间的努力，我已经将其修复为单个流（视频或音频），但是当我尝试同时录制音频和视频时，此问题再次出现。 Something is wrong with my topology but i can't understand what and probably there are some experts here? 我的拓扑有问题，但是我不明白是什么，这里可能有一些专家？

I get audio and video source, wrap it into IMFCollection, create aggregate source by MFCreateAggregateSource. 我得到音频和视频源，将其包装到IMFCollection中，并通过MFCreateAggregateSource创建聚合源。 I create source nodes for each source in aggregate source: 我为聚合源中的每个源创建源节点：

Com::IMFTopologyNodePtr 
TopologyBuilder::CreateSourceNode(Com::IMFStreamDescriptorPtr 
streamDescriptor)
{
    HRESULT hr = S_OK;
    Com::IMFTopologyNodePtr pNode;
    // Create the topology node, indicating that it must be a source node.
    hr = MFCreateTopologyNode(MF_TOPOLOGY_SOURCESTREAM_NODE, &pNode);
    THROW_ON_FAIL(hr, "Unable to create topology node for source");

    // Associate the node with the source by passing in a pointer to the media source,
    // and indicating that it is the source
    hr = pNode->SetUnknown(MF_TOPONODE_SOURCE, _sourceDefinition->GetMediaSource());
    THROW_ON_FAIL(hr, "Unable to set source as object for topology node");

    // Set the node presentation descriptor attribute of the node by passing
    // in a pointer to the presentation descriptor
    hr = pNode->SetUnknown(MF_TOPONODE_PRESENTATION_DESCRIPTOR, _sourceDefinition->GetPresentationDescriptor());
    THROW_ON_FAIL(hr, "Unable to set MF_TOPONODE_PRESENTATION_DESCRIPTOR to node");

    // Set the node stream descriptor attribute by passing in a pointer to the stream
    // descriptor
    hr = pNode->SetUnknown(MF_TOPONODE_STREAM_DESCRIPTOR, streamDescriptor);
    THROW_ON_FAIL(hr, "Unable to set MF_TOPONODE_STREAM_DESCRIPTOR to node");

    return pNode;
}

After that i connect each source to transform(H264 encoder and AAC encoder) and to MPEG4FileSink: 之后，我将每个源连接到transform（H264编码器和AAC编码器）以及MPEG4FileSink：

void TopologyBuilder::CreateFileSinkOutputNode(PCWSTR filePath)
{
    HRESULT hr = S_OK;
    DWORD sink_count;

    Com::IMFByteStreamPtr byte_stream;
    Com::IMFTransformPtr transform;

    LPCWSTR lpcwstrFilePath = filePath;
    hr = MFCreateFile(
    MF_ACCESSMODE_WRITE, MF_OPENMODE_FAIL_IF_NOT_EXIST, MF_FILEFLAGS_NONE,
    lpcwstrFilePath, &byte_stream);
    THROW_ON_FAIL(hr, L"Unable to create and open file");

// Video stream
    Com::IMFMediaTypePtr in_mf_video_media_type = _sourceDefinition->GetCurrentVideoMediaType();

    Com::IMFMediaTypePtr out_mf_media_type = CreateMediaType(MFMediaType_Video, MFVideoFormat_H264);
    hr = CopyType(in_mf_video_media_type, out_mf_media_type);
    THROW_ON_FAIL(hr, L"Unable to copy type parameters");

    if (GetSubtype(in_mf_video_media_type) != MEDIASUBTYPE_H264)
    {
        transform.Attach(CreateAndInitCoderMft(MFT_CATEGORY_VIDEO_ENCODER, out_mf_media_type));
        THROW_ON_NULL(transform);
    }

    if (transform)
    {
        Com::IMFMediaTypePtr transformMediaType;
        hr = transform->GetOutputCurrentType(0, &transformMediaType);
        THROW_ON_FAIL(hr, L"Unable to get current output type");

        UINT32 pcbBlobSize = 0;
        hr = transformMediaType->GetBlobSize(MF_MT_MPEG_SEQUENCE_HEADER, &pcbBlobSize);
        THROW_ON_FAIL(hr, L"Unable to get blob size of MF_MT_MPEG_SEQUENCE_HEADER");

        std::vector<UINT8> blob(pcbBlobSize);
        hr = transformMediaType->GetBlob(MF_MT_MPEG_SEQUENCE_HEADER, &blob.front(), blob.size(), NULL);
        THROW_ON_FAIL(hr, L"Unable to get blob MF_MT_MPEG_SEQUENCE_HEADER");

        hr = out_mf_media_type->SetBlob(MF_MT_MPEG_SEQUENCE_HEADER, &blob.front(), blob.size());
        THROW_ON_FAIL(hr, L"Unable to set blob of MF_MT_MPEG_SEQUENCE_HEADER");
    }

    // Audio stream
    Com::IMFMediaTypePtr out_mf_audio_media_type;
    Com::IMFTransformPtr transformAudio;
    Com::IMFMediaTypePtr mediaTypeTmp = _sourceDefinition->GetCurrentAudioMediaType();
    Com::IMFMediaTypePtr in_mf_audio_media_type;
    if (mediaTypeTmp != NULL)
    {
        std::unique_ptr<MediaTypesFactory> factory(new MediaTypesFactory());
        if (!IsMediaTypeSupportedByAacEncoder(mediaTypeTmp))
        {
            UINT32 channels;
            hr = mediaTypeTmp->GetUINT32(MF_MT_AUDIO_NUM_CHANNELS, &channels);
            THROW_ON_FAIL(hr, L"Unable to get MF_MT_AUDIO_NUM_CHANNELS fron source media type");
            in_mf_audio_media_type = factory->CreatePCM(factory->DEFAULT_SAMPLE_RATE, channels);
        }
        else
        {
            in_mf_audio_media_type.Attach(mediaTypeTmp.Detach());
        }

        out_mf_audio_media_type = factory->CreateAAC(in_mf_audio_media_type, factory->HIGH_ENCODED_BITRATE);
        GUID subType = GetSubtype(in_mf_audio_media_type);
        if (GetSubtype(in_mf_audio_media_type) != MFAudioFormat_AAC)
        {
            // add encoder to Aac
        transformAudio.Attach(CreateAndInitCoderMft(MFT_CATEGORY_AUDIO_ENCODER, out_mf_audio_media_type));
        }
    }

    Com::IMFMediaSinkPtr pFileSink;
    hr = MFCreateMPEG4MediaSink(byte_stream, out_mf_media_type,     out_mf_audio_media_type, &pFileSink);
    THROW_ON_FAIL(hr, L"Unable to create mpeg4 media sink");

    Com::IMFTopologyNodePtr pOutputNodeVideo;
    hr = MFCreateTopologyNode(MF_TOPOLOGY_OUTPUT_NODE, &pOutputNodeVideo);
    THROW_ON_FAIL(hr, L"Unable to create output node");

    hr = pFileSink->GetStreamSinkCount(&sink_count);
    THROW_ON_FAIL(hr, L"Unable to get stream sink count from mediasink");

    if (sink_count == 0)
    {
        THROW_ON_FAIL(E_UNEXPECTED, L"Sink count should be greater than 0");
    }

    Com::IMFStreamSinkPtr stream_sink_video;
    hr = pFileSink->GetStreamSinkByIndex(0, &stream_sink_video);
    THROW_ON_FAIL(hr, L"Unable to get stream sink by index");

    hr = pOutputNodeVideo->SetObject(stream_sink_video);
    THROW_ON_FAIL(hr, L"Unable to set stream sink as output node object");

    hr = _pTopology->AddNode(pOutputNodeVideo);
    THROW_ON_FAIL(hr, L"Unable to add file sink output node");

    pOutputNodeVideo = AddEncoderIfNeed(_pTopology, transform, in_mf_video_media_type, pOutputNodeVideo);

    _outVideoNodes.push_back(pOutputNodeVideo);

    Com::IMFTopologyNodePtr pOutputNodeAudio;

    if (in_mf_audio_media_type != NULL)
    {
        hr = MFCreateTopologyNode(MF_TOPOLOGY_OUTPUT_NODE, &pOutputNodeAudio);
        THROW_ON_FAIL(hr, L"Unable to create output node");

        Com::IMFStreamSinkPtr stream_sink_audio;
        hr = pFileSink->GetStreamSinkByIndex(1, &stream_sink_audio);
        THROW_ON_FAIL(hr, L"Unable to get stream sink by index");

        hr = pOutputNodeAudio->SetObject(stream_sink_audio);
        THROW_ON_FAIL(hr, L"Unable to set stream sink as output node object");

        hr = _pTopology->AddNode(pOutputNodeAudio);
        THROW_ON_FAIL(hr, L"Unable to add file sink output node");

        if (transformAudio)
        {
            Com::IMFTopologyNodePtr outputTransformNodeAudio;
            AddTransformNode(_pTopology, transformAudio, pOutputNodeAudio, &outputTransformNodeAudio);

            _outAudioNode = outputTransformNodeAudio;
        }
        else
    {
            _outAudioNode = pOutputNodeAudio;
        }
    }
}

When output type is applied on to audio transform, it has 15 attributes instead of 8, including MF_MT_AVG_BITRATE which should be applied to video as i understand. 当输出类型应用于音频转换时，它具有15个属性而不是8个属性，包括MF_MT_AVG_BITRATE，据我所知，应将其应用于视频。 In my case it is 192000 and it is different of MF_MT_AVG_BITRATE on video stream. 在我的情况下是192000，与视频流上的MF_MT_AVG_BITRATE不同。 My AAC media type is creating by this method: 我的AAC媒体类型是通过这种方法创建的：

HRESULT MediaTypesFactory::CopyAudioTypeBasicAttributes(IMFMediaType * in_media_type, IMFMediaType * out_mf_media_type) {
    HRESULT hr = S_OK;
    static const GUID AUDIO_MAJORTYPE = MFMediaType_Audio;
    static const GUID AUDIO_SUBTYPE = MFAudioFormat_PCM;

    out_mf_media_type->SetUINT32(MF_MT_AUDIO_BITS_PER_SAMPLE, AUDIO_BITS_PER_SAMPLE);

    WAVEFORMATEX *in_wfx;
    UINT32 wfx_size;

    MFCreateWaveFormatExFromMFMediaType(in_media_type, &in_wfx, &wfx_size);

    hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_SAMPLES_PER_SECOND, in_wfx->nSamplesPerSec);
    DEBUG_ON_FAIL(hr);

    hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_NUM_CHANNELS, in_wfx->nChannels);
    DEBUG_ON_FAIL(hr);

    hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_AVG_BYTES_PER_SECOND, in_wfx->nAvgBytesPerSec);
    DEBUG_ON_FAIL(hr);

    hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_BLOCK_ALIGNMENT, in_wfx->nBlockAlign);
    DEBUG_ON_FAIL(hr);

    return hr;
}

It would be awesome if somebody can help me or explain where i am wrong. 如果有人可以帮助我或解释我错了，那就太好了。 Thanks. 谢谢。

Answer 1

In my project CaptureManager I faced with similar problem - while I have wrote code for recording live video from many web cams into the one file. 在我的项目CaptureManager中，我遇到了类似的问题-虽然我编写了用于将许多网络摄像头的实时视频记录到一个文件中的代码。 After long time research of Media Foundation I found two important facts: 1. live sources - web cams and microphones do not start from 0 - according of specification samples from them should start from 0 time stamp - Live Sources - "The first sample should have a time stamp of zero." 经过对Media Foundation的长期研究，我发现了两个重要事实：1.实时信号-网络摄像头和麦克风并非从0开始-根据规范样本，它们应从0时间戳开始- 实时信号源 -“第一个示例应具有时间戳记为零。” - but live sources set current system time. -但是实时资源会设置当前系统时间。 2. I see from you code that you use Media Session - it is an object with IMFMediaSession interface. 2.我从您的代码中看到您正在使用Media IMFMediaSession它是具有IMFMediaSession接口的对象。 I think you create it from MFCreateMediaSession function. 我认为您是从MFCreateMediaSession函数创建它的。 This function creates default version of session which is optimized for playing of media from file, which samples starts from 0 by default. 此函数创建会话的默认版本，该版本针对从文件播放媒体进行了优化，默认情况下样本从0开始。 In my view,the main problem is that default Media Session does not check time stamp of media samples from source, because from media file they start from zero or from StartPosition. 在我看来，主要问题在于默认的媒体会话不会检查源中媒体样本的时间戳，因为从媒体文件中它们从零开始或从StartPosition开始。 However, live sources do not start from 0 - they should, or must, but do not. 但是，实时源代码并非从0开始-它们应该或必须但不是。 So, my advise - write class with IMFTransform which will be "Proxy" transform between source and encoder - this "Proxy" transform must fix time stamp of media samples from live source: 1. while it receive first media sample from live source, it save actual time stamp of the first media sample like reference time, and set time stamp of the first media sample to zero, all time stamps the next media samples from this live source must be subtracted by this reference time and set to time stamps of media samples. 因此，我的建议-使用IMFTransform编写类，这将是源和编码器之间的“代理”转换-此“代理”转换必须修复来自实时源的媒体样本的时间戳：1.在从实时源接收第一个媒体样本的同时，它保存第一个媒体样本的实际时间戳，例如参考时间，并将第一个媒体样本的时间戳设置为零，必须将此参考时间减去该实时源中的下一个媒体样本的所有时间戳，并将其设置为媒体的时间戳样品。 Also, check code for calling of IMFFinalizableMediaSink . 另外，检查用于调用IMFFinalizableMediaSink的代码。

Regards. 问候。

Answer 2

MP4 metadata might under some conditions be initialized incorrectly (eg like this ), however in the scenario you described the problem is like to be the payload data and not the way you set up the pipeline in first place. 在某些情况下，MP4元数据可能未正确初始化（例如，这样），但是在您所描述的情况下，问题就像是有效载荷数据，而不是最初设置管道的方式。

The decoders and converters are typically passing time stamps of samples through copying them from input to output, so they are not indicating a failure if something is wrong - you still have output that makes sense written into file. 解码器和转换器通常将样本的时间戳传递通过从输入到输出的复制，因此，如果出现错误，它们并不表示失败-您仍然可以将有意义的输出写入文件中。 The sink might be having issues processing your data if you have sample time issues, very long recordings, overflow bug esp. 如果您遇到采样时间问题，很长的记录，尤其是溢出错误，则接收器可能在处理数据时遇到问题。 in case of rates expressed with large numerators/denominators. 如果汇率用大分子/分母表示。 Important is what sample times the sources produce. 重要的是源产生多少采样时间。

You might want to try to record shorter recordings, also video only and audio only recording that might possibly help in identification of the stream which supplies the data leading to the problem. 您可能想要尝试记录较短的记录，也可能是纯视频和纯音频的记录，这可能有助于识别提供导致问题的数据的流。

Additionally, you might want to inspect the resulting MP4 file atoms/boxes to identify whether the header boxes have incorrect data or data itself is stamped incorrectly, on which track and how exactly (esp. starts okay and then does a weird gaps in the middle). 此外，您可能需要检查生成的MP4文件原子/框，以识别标题框是否包含错误的数据，或者数据本身是否被错误地标记，在哪个轨道上以及正确的位置（尤其是开始正确，然后在中间出现怪异的间隙））。

Media Foundation音频/视频捕获到MPEG4FileSink会产生错误的持续时间

问题描述

2 个解决方案

解决方案1
2 2017-08-31 01:02:24

解决方案2
0 2017-08-30 17:29:19

Media Foundation音频/视频捕获到MPEG4FileSink会产生错误的持续时间

问题描述

2 个解决方案

解决方案1 2 2017-08-31 01:02:24

解决方案2 0 2017-08-30 17:29:19

解决方案1
2 2017-08-31 01:02:24

解决方案2
0 2017-08-30 17:29:19