Media Foundation Audio/Video capturing to MPEG4FileSink produces incorrect duration

Question

I am working on media streaming application using Media Foundation framework. I've used some samples from internet and from Anton Polinger book. Unfortunately after saving streams into mp4 file metadata of file is corrupted. It has incorrect duration (according to time of work of my PC, 30 hours for instance), wrong bitrate. After long struggling I've fixed it for single stream (video or audio) but when i try to record both audio and video this problem returns again. Something is wrong with my topology but i can't understand what and probably there are some experts here?

I get audio and video source, wrap it into IMFCollection, create aggregate source by MFCreateAggregateSource. I create source nodes for each source in aggregate source:

Com::IMFTopologyNodePtr 
TopologyBuilder::CreateSourceNode(Com::IMFStreamDescriptorPtr 
streamDescriptor)
{
    HRESULT hr = S_OK;
    Com::IMFTopologyNodePtr pNode;
    // Create the topology node, indicating that it must be a source node.
    hr = MFCreateTopologyNode(MF_TOPOLOGY_SOURCESTREAM_NODE, &pNode);
    THROW_ON_FAIL(hr, "Unable to create topology node for source");

    // Associate the node with the source by passing in a pointer to the media source,
    // and indicating that it is the source
    hr = pNode->SetUnknown(MF_TOPONODE_SOURCE, _sourceDefinition->GetMediaSource());
    THROW_ON_FAIL(hr, "Unable to set source as object for topology node");

    // Set the node presentation descriptor attribute of the node by passing
    // in a pointer to the presentation descriptor
    hr = pNode->SetUnknown(MF_TOPONODE_PRESENTATION_DESCRIPTOR, _sourceDefinition->GetPresentationDescriptor());
    THROW_ON_FAIL(hr, "Unable to set MF_TOPONODE_PRESENTATION_DESCRIPTOR to node");

    // Set the node stream descriptor attribute by passing in a pointer to the stream
    // descriptor
    hr = pNode->SetUnknown(MF_TOPONODE_STREAM_DESCRIPTOR, streamDescriptor);
    THROW_ON_FAIL(hr, "Unable to set MF_TOPONODE_STREAM_DESCRIPTOR to node");

    return pNode;
}

After that i connect each source to transform(H264 encoder and AAC encoder) and to MPEG4FileSink:

void TopologyBuilder::CreateFileSinkOutputNode(PCWSTR filePath)
{
    HRESULT hr = S_OK;
    DWORD sink_count;

    Com::IMFByteStreamPtr byte_stream;
    Com::IMFTransformPtr transform;

    LPCWSTR lpcwstrFilePath = filePath;
    hr = MFCreateFile(
    MF_ACCESSMODE_WRITE, MF_OPENMODE_FAIL_IF_NOT_EXIST, MF_FILEFLAGS_NONE,
    lpcwstrFilePath, &byte_stream);
    THROW_ON_FAIL(hr, L"Unable to create and open file");

// Video stream
    Com::IMFMediaTypePtr in_mf_video_media_type = _sourceDefinition->GetCurrentVideoMediaType();

    Com::IMFMediaTypePtr out_mf_media_type = CreateMediaType(MFMediaType_Video, MFVideoFormat_H264);
    hr = CopyType(in_mf_video_media_type, out_mf_media_type);
    THROW_ON_FAIL(hr, L"Unable to copy type parameters");

    if (GetSubtype(in_mf_video_media_type) != MEDIASUBTYPE_H264)
    {
        transform.Attach(CreateAndInitCoderMft(MFT_CATEGORY_VIDEO_ENCODER, out_mf_media_type));
        THROW_ON_NULL(transform);
    }

    if (transform)
    {
        Com::IMFMediaTypePtr transformMediaType;
        hr = transform->GetOutputCurrentType(0, &transformMediaType);
        THROW_ON_FAIL(hr, L"Unable to get current output type");

        UINT32 pcbBlobSize = 0;
        hr = transformMediaType->GetBlobSize(MF_MT_MPEG_SEQUENCE_HEADER, &pcbBlobSize);
        THROW_ON_FAIL(hr, L"Unable to get blob size of MF_MT_MPEG_SEQUENCE_HEADER");

        std::vector<UINT8> blob(pcbBlobSize);
        hr = transformMediaType->GetBlob(MF_MT_MPEG_SEQUENCE_HEADER, &blob.front(), blob.size(), NULL);
        THROW_ON_FAIL(hr, L"Unable to get blob MF_MT_MPEG_SEQUENCE_HEADER");

        hr = out_mf_media_type->SetBlob(MF_MT_MPEG_SEQUENCE_HEADER, &blob.front(), blob.size());
        THROW_ON_FAIL(hr, L"Unable to set blob of MF_MT_MPEG_SEQUENCE_HEADER");
    }

    // Audio stream
    Com::IMFMediaTypePtr out_mf_audio_media_type;
    Com::IMFTransformPtr transformAudio;
    Com::IMFMediaTypePtr mediaTypeTmp = _sourceDefinition->GetCurrentAudioMediaType();
    Com::IMFMediaTypePtr in_mf_audio_media_type;
    if (mediaTypeTmp != NULL)
    {
        std::unique_ptr<MediaTypesFactory> factory(new MediaTypesFactory());
        if (!IsMediaTypeSupportedByAacEncoder(mediaTypeTmp))
        {
            UINT32 channels;
            hr = mediaTypeTmp->GetUINT32(MF_MT_AUDIO_NUM_CHANNELS, &channels);
            THROW_ON_FAIL(hr, L"Unable to get MF_MT_AUDIO_NUM_CHANNELS fron source media type");
            in_mf_audio_media_type = factory->CreatePCM(factory->DEFAULT_SAMPLE_RATE, channels);
        }
        else
        {
            in_mf_audio_media_type.Attach(mediaTypeTmp.Detach());
        }

        out_mf_audio_media_type = factory->CreateAAC(in_mf_audio_media_type, factory->HIGH_ENCODED_BITRATE);
        GUID subType = GetSubtype(in_mf_audio_media_type);
        if (GetSubtype(in_mf_audio_media_type) != MFAudioFormat_AAC)
        {
            // add encoder to Aac
        transformAudio.Attach(CreateAndInitCoderMft(MFT_CATEGORY_AUDIO_ENCODER, out_mf_audio_media_type));
        }
    }

    Com::IMFMediaSinkPtr pFileSink;
    hr = MFCreateMPEG4MediaSink(byte_stream, out_mf_media_type,     out_mf_audio_media_type, &pFileSink);
    THROW_ON_FAIL(hr, L"Unable to create mpeg4 media sink");

    Com::IMFTopologyNodePtr pOutputNodeVideo;
    hr = MFCreateTopologyNode(MF_TOPOLOGY_OUTPUT_NODE, &pOutputNodeVideo);
    THROW_ON_FAIL(hr, L"Unable to create output node");

    hr = pFileSink->GetStreamSinkCount(&sink_count);
    THROW_ON_FAIL(hr, L"Unable to get stream sink count from mediasink");

    if (sink_count == 0)
    {
        THROW_ON_FAIL(E_UNEXPECTED, L"Sink count should be greater than 0");
    }

    Com::IMFStreamSinkPtr stream_sink_video;
    hr = pFileSink->GetStreamSinkByIndex(0, &stream_sink_video);
    THROW_ON_FAIL(hr, L"Unable to get stream sink by index");

    hr = pOutputNodeVideo->SetObject(stream_sink_video);
    THROW_ON_FAIL(hr, L"Unable to set stream sink as output node object");

    hr = _pTopology->AddNode(pOutputNodeVideo);
    THROW_ON_FAIL(hr, L"Unable to add file sink output node");

    pOutputNodeVideo = AddEncoderIfNeed(_pTopology, transform, in_mf_video_media_type, pOutputNodeVideo);

    _outVideoNodes.push_back(pOutputNodeVideo);

    Com::IMFTopologyNodePtr pOutputNodeAudio;

    if (in_mf_audio_media_type != NULL)
    {
        hr = MFCreateTopologyNode(MF_TOPOLOGY_OUTPUT_NODE, &pOutputNodeAudio);
        THROW_ON_FAIL(hr, L"Unable to create output node");

        Com::IMFStreamSinkPtr stream_sink_audio;
        hr = pFileSink->GetStreamSinkByIndex(1, &stream_sink_audio);
        THROW_ON_FAIL(hr, L"Unable to get stream sink by index");

        hr = pOutputNodeAudio->SetObject(stream_sink_audio);
        THROW_ON_FAIL(hr, L"Unable to set stream sink as output node object");

        hr = _pTopology->AddNode(pOutputNodeAudio);
        THROW_ON_FAIL(hr, L"Unable to add file sink output node");

        if (transformAudio)
        {
            Com::IMFTopologyNodePtr outputTransformNodeAudio;
            AddTransformNode(_pTopology, transformAudio, pOutputNodeAudio, &outputTransformNodeAudio);

            _outAudioNode = outputTransformNodeAudio;
        }
        else
    {
            _outAudioNode = pOutputNodeAudio;
        }
    }
}

When output type is applied on to audio transform, it has 15 attributes instead of 8, including MF_MT_AVG_BITRATE which should be applied to video as i understand. In my case it is 192000 and it is different of MF_MT_AVG_BITRATE on video stream. My AAC media type is creating by this method:

HRESULT MediaTypesFactory::CopyAudioTypeBasicAttributes(IMFMediaType * in_media_type, IMFMediaType * out_mf_media_type) {
    HRESULT hr = S_OK;
    static const GUID AUDIO_MAJORTYPE = MFMediaType_Audio;
    static const GUID AUDIO_SUBTYPE = MFAudioFormat_PCM;

    out_mf_media_type->SetUINT32(MF_MT_AUDIO_BITS_PER_SAMPLE, AUDIO_BITS_PER_SAMPLE);

    WAVEFORMATEX *in_wfx;
    UINT32 wfx_size;

    MFCreateWaveFormatExFromMFMediaType(in_media_type, &in_wfx, &wfx_size);

    hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_SAMPLES_PER_SECOND, in_wfx->nSamplesPerSec);
    DEBUG_ON_FAIL(hr);

    hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_NUM_CHANNELS, in_wfx->nChannels);
    DEBUG_ON_FAIL(hr);

    hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_AVG_BYTES_PER_SECOND, in_wfx->nAvgBytesPerSec);
    DEBUG_ON_FAIL(hr);

    hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_BLOCK_ALIGNMENT, in_wfx->nBlockAlign);
    DEBUG_ON_FAIL(hr);

    return hr;
}

It would be awesome if somebody can help me or explain where i am wrong. Thanks.

Answer 1

In my project CaptureManager I faced with similar problem - while I have wrote code for recording live video from many web cams into the one file. After long time research of Media Foundation I found two important facts: 1. live sources - web cams and microphones do not start from 0 - according of specification samples from them should start from 0 time stamp - Live Sources - "The first sample should have a time stamp of zero." - but live sources set current system time. 2. I see from you code that you use Media Session - it is an object with IMFMediaSession interface. I think you create it from MFCreateMediaSession function. This function creates default version of session which is optimized for playing of media from file, which samples starts from 0 by default. In my view,the main problem is that default Media Session does not check time stamp of media samples from source, because from media file they start from zero or from StartPosition. However, live sources do not start from 0 - they should, or must, but do not. So, my advise - write class with IMFTransform which will be "Proxy" transform between source and encoder - this "Proxy" transform must fix time stamp of media samples from live source: 1. while it receive first media sample from live source, it save actual time stamp of the first media sample like reference time, and set time stamp of the first media sample to zero, all time stamps the next media samples from this live source must be subtracted by this reference time and set to time stamps of media samples. Also, check code for calling of IMFFinalizableMediaSink .

Regards.

Answer 2

MP4 metadata might under some conditions be initialized incorrectly (eg like this ), however in the scenario you described the problem is like to be the payload data and not the way you set up the pipeline in first place.

The decoders and converters are typically passing time stamps of samples through copying them from input to output, so they are not indicating a failure if something is wrong - you still have output that makes sense written into file. The sink might be having issues processing your data if you have sample time issues, very long recordings, overflow bug esp. in case of rates expressed with large numerators/denominators. Important is what sample times the sources produce.

You might want to try to record shorter recordings, also video only and audio only recording that might possibly help in identification of the stream which supplies the data leading to the problem.

Additionally, you might want to inspect the resulting MP4 file atoms/boxes to identify whether the header boxes have incorrect data or data itself is stamped incorrectly, on which track and how exactly (esp. starts okay and then does a weird gaps in the middle).

Media Foundation Audio/Video capturing to MPEG4FileSink produces incorrect duration

Question

2 answers

solution1
2 2017-08-31 01:02:24

solution2
0 2017-08-30 17:29:19

Media Foundation Audio/Video capturing to MPEG4FileSink produces incorrect duration

Question

2 answers

solution1 2 2017-08-31 01:02:24

solution2 0 2017-08-30 17:29:19

solution1
2 2017-08-31 01:02:24

solution2
0 2017-08-30 17:29:19