简体   繁体   中英

Windows MFT (Media Foundation Transform) decoder not returning proper sample time or duration

To decode a H264 stream with the Windows Media foundation Transform, the work flow is currently something like this:

IMFSample sample;
sample->SetTime(time_in_ns);
sample->SetDuration(duration_in_ns);
sample->AddBuffer(buffer);

// Feed IMFSample to decoder
mDecoder->ProcessInput(0, sample, 0);

// Get output from decoder.
/* create outputsample that will receive content */ { ... }
MFT_OUTPUT_DATA_BUFFER output = {0};
output.pSample = outputsample;
DWORD status = 0;
HRESULT hr = mDecoder->ProcessOutput(0, 1, &output, &status);
DWORD status = 0;
hr = mDecoder->ProcessOutput(0, 1, &output, &status);
if (output.pEvents) {
  // We must release this, as per the IMFTransform::ProcessOutput()
  // MSDN documentation.
  output.pEvents->Release();
  output.pEvents = nullptr;
}

if (hr == MF_E_TRANSFORM_STREAM_CHANGE) {
  // Type change, probably geometric aperture change.
  // Reconfigure decoder output type, so that GetOutputMediaType()
} else if (hr == MF_E_TRANSFORM_NEED_MORE_INPUT) {
  // Not enough input to produce output.
} else if (!output.pSample) {
  return S_OK;
} else }
  // Process output
}

}

When we have fed all data to the MFT decoder, we must drain it:

mDecoder->ProcessMessage(MFT_MESSAGE_COMMAND_DRAIN, 0);

Now, one thing with the WMF H264 decoder, is that it will typically not output anything before having been called with over 30 compressed h264 frames regardless of the size of the h264 sliding window. Latency is very high...

I'm encountering an issue that is very troublesome. With a video made only of keyframes, and which has only 15 frames, each being 2s long, the first frame having a presentation time of non-zero (this stream is from live content, so first frame is typically in epos time) So without draining the decoder, nothing will come out of the decoder as it hasn't received enough frames.

However, once the decoder is drained, the decoded frame will come out. HOWEVER, the MFT decoder has set all durations to 33.6ms only and the presentation time of the first sample coming out is always 0. The original duration and presentation time have been lost.

If you provide over 30 frames to the h264 decoder, then both duration and pts are valid...

I haven't yet found a way to get the WMF decoder to output samples with the proper value. It appears that if you have to drain the decoder before it has output any samples by itself, then it's totally broken...

Has anyone experienced such problems? How did you get around it?

Thank you in advance

Edit: a sample of the video is available on http://people.mozilla.org/~jyavenard/mediatest/fragmented/1301869.mp4 Playing this video with Firefox will causes it to play extremely quickly due to the problems described above.

I'm not sure that your work flow is correct. I think you should do something like this:

do
{
    ...
    hr = mDecoder->ProcessInput(0, sample, 0);
    if(FAILED(hr))
      break;
    ...
    hr = mDecoder->ProcessOutput(0, 1, &output, &status);
    if(FAILED(hr) && hr != MF_E_TRANSFORM_NEED_MORE_INPUT)
      break;
}
while(hr == MF_E_TRANSFORM_NEED_MORE_INPUT);

if(SUCCEEDED(hr))
{
    // You have a valid decoded frame here
}

The idea is to keep calling ProcessInput/ProcessOuptut while ProcessOutput returns MF_E_TRANSFORM_NEED_MORE_INPUT. MF_E_TRANSFORM_NEED_MORE_INPUT means that decoder needs more input. I think that with this loop you won't need to drain the decoder.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM