简体   繁体   English

如何正确使用硬件加速的 Media Foundation Source Reader 解码视频?

[英]How to properly use a hardware accelerated Media Foundation Source Reader to decode a video?

I'm in the process of writing a hardware accelerated h264 decoder using Media Foundation's Source Reader, but have encountered a problem.我正在使用 Media Foundation 的 Source Reader 编写硬件加速的 h264 解码器,但遇到了问题。 I followed this tutorial and supported myself with Windows SDK Media Foundation samples.我遵循本教程并使用 Windows SDK Media Foundation 示例支持自己。


My app seems to work fine when hardware acceleration is turned off, but it doesn't provide the performance I need.当硬件加速关闭时,我的应用程序似乎工作正常,但它没有提供我需要的性能。 When I turn the acceleration on by passing a IMFDXGIDeviceManager to IMFAttributes used to create the reader, things get complicated.当我传递一个打开加速IMFDXGIDeviceManagerIMFAttributes用于创建读者,事情变得复杂。

If I create the ID3D11Device using a D3D_DRIVER_TYPE_NULL driver, the app works fine and the frames are processed faster that in the software mode, but judging by the CPU and GPU usage it still does majority of the processing on CPU.如果我使用D3D_DRIVER_TYPE_NULL驱动程序创建ID3D11Device ,该应用程序运行良好,帧处理速度比在软件模式下更快,但从 CPU 和 GPU 使用情况来看,它仍然在 CPU 上进行大部分处理。

On the other hand, when I create the ID3D11Device using a D3D_DRIVER_TYPE_HARDWARE driver and run the app, one of these four things can happen.另一方面,当我使用D3D_DRIVER_TYPE_HARDWARE驱动程序创建ID3D11Device并运行该应用程序时,可能会发生这四种情况之一。

  1. I only get an unpredictable number of frames (usually 1-3) before IMFMediaBuffer::Lock function returns 0x887a0005 which is described as "The GPU device instance has been suspended. Use GetDeviceRemovedReason to determine the appropriate action".IMFMediaBuffer::Lock函数返回 0x887a0005 之前,我只得到不可预测的帧数(通常为 1-3),这被描述为“GPU 设备实例已暂停。使用GetDeviceRemovedReason确定适当的操作”。 When I call ID3D11Device::GetDeviceRemovedReason , I get 0x887a0020 which is described as "The driver encountered a problem and was put into the device removed state" which isn't as helpful as I wish it to be.当我调用ID3D11Device::GetDeviceRemovedReason ,我得到 0x887a0020,它被描述为“驱动程序遇到问题并被置于设备删除状态”,这并不像我希望的那样有用。

  2. The app crashes in an external dll on IMFMediaBuffer::Lock call.应用程序在IMFMediaBuffer::Lock调用的外部 dll 中崩溃。 It seems that the dll depends on the GPU used. dll 似乎取决于所使用的 GPU。 For Intel integrated GPU it's igd10iumd32.dll and for Nvidia mobile GPU it's mfplat.dll.对于 Intel 集成 GPU,它是 igd10iumd32.dll,对于 Nvidia 移动 GPU,它是 mfplat.dll。 The message for this particular crash is as follows: "Exception thrown at 0x53C6DB8C (mfplat.dll) in decoder_ tester.exe: 0xC0000005: Access violation reading location 0x00000024".此特定崩溃的消息如下:“在decoder_tester.exe 中的 0x53C6DB8C (mfplat.dll) 处抛出异常:0xC0000005:访问冲突读取位置 0x00000024”。 The addresses are different between executions and sometimes it involves reading, sometimes writing.执行之间的地址不同,有时涉及读取,有时涉及写入。

  3. The graphics driver stops responding, the system hangs for a short time and then the application crashes like in point 2 or finishes like in point 1.图形驱动程序停止响应,系统挂起一小段时间,然后应用程序像第 2 点那样崩溃或像第 1 点那样结束。

  4. The app works fine and processes all the frames with hardware acceleration.该应用程序运行良好,并使用硬件加速处理所有帧。

Most of the time it's 1 or 2, seldom 3 or 4.大多数时候是 1 或 2,很少是 3 或 4。


Here's what the CPU/GPU usage is like when processing without throttling in different modes on my machine (Intel Core i5-6500 with HD Graphics 530, Windows 10 Pro).以下是在我的机器(Intel Core i5-6500 with HD Graphics 530、Windows 10 Pro)上以不同模式进行处理时 CPU/GPU 的使用情况。

  • NULL - CPU: ~90%, GPU: ~15% NULL - CPU:~90%,GPU:~15%
  • HARDWARE - CPU: ~15%, GPU: ~60%硬件 - CPU:~15%,GPU:~60%
  • SOFTWARE - CPU: ~40%, GPU: ~7%软件 - CPU:~40%,GPU:~7%

I tested the app on three machines.我在三台机器上测试了该应用程序。 All of them had Intel integrated GPUs (HD 4400, HD 4600, HD 530).它们都具有 Intel 集成 GPU(HD 4400、HD 4600、HD 530)。 One of them also had switchable Nvidia dedicated GPU (GF 840M).其中之一还具有可切换的 Nvidia 专用 GPU(GF 840M)。 It bahaves identically on all of them, the only difference is that it crashes in a different dll when Nvidia's GPU is used.它对所有这些都完全相同,唯一的区别是当使用 Nvidia 的 GPU 时,它会在不同的 dll 中崩溃。


I have no previous experience with COM or DirectX, but all of this is inconsistent and unpredictable, so it looks like a memory corruption to me.我以前没有使用 COM 或 DirectX 的经验,但所有这些都是不一致且不可预测的,因此在我看来就像内存损坏。 Still, I don't know where I'm making the mistake.不过,我不知道我在哪里犯了错误。 Could you please help me find what I'm doing wrong?你能帮我找出我做错了什么吗?

The minimal code example I could come up with with is below.我可以想出的最小代码示例如下。 I'm using Visual Studio Professional 2015 to compile it as a C++ project.我正在使用 Visual Studio Professional 2015 将其编译为 C++ 项目。 I prepared definitions to enable hardware acceleration and select the hardware driver.我准备了启用硬件加速和选择硬件驱动程序的定义。 Comment them out to change the behavior.将它们注释掉以改变行为。 Also, the code expects this video file to be present in the project directory.此外,代码希望此视频文件存在于项目目录中。

#include <iostream>
#include <string>
#include <atlbase.h>
#include <d3d11.h>
#include <mfapi.h>
#include <mfidl.h>
#include <mfreadwrite.h>
#include <windows.h>

#pragma comment(lib, "d3d11.lib")
#pragma comment(lib, "mf.lib")
#pragma comment(lib, "mfplat.lib")
#pragma comment(lib, "mfreadwrite.lib")
#pragma comment(lib, "mfuuid.lib")

#define ENABLE_HW_ACCELERATION
#define ENABLE_HW_DRIVER

void handle_result(HRESULT hr)
{
    if (SUCCEEDED(hr))
        return;

    WCHAR message[512];

    FormatMessage(FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS, nullptr, hr,
        MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), message, ARRAYSIZE(message), nullptr);

    printf("%ls", message);
    abort();
}

int main(int argc, char** argv)
{
    handle_result(CoInitializeEx(nullptr, COINIT_APARTMENTTHREADED | COINIT_DISABLE_OLE1DDE));
    handle_result(MFStartup(MF_VERSION));

    {
        CComPtr<IMFAttributes> attributes;

        handle_result(MFCreateAttributes(&attributes, 3));

#if defined(ENABLE_HW_ACCELERATION)
        CComPtr<ID3D11Device> device;
        D3D_FEATURE_LEVEL levels[] = { D3D_FEATURE_LEVEL_11_1, D3D_FEATURE_LEVEL_11_0 };

#if defined(ENABLE_HW_DRIVER)
        handle_result(D3D11CreateDevice(nullptr, D3D_DRIVER_TYPE_HARDWARE, nullptr, D3D11_CREATE_DEVICE_SINGLETHREADED | D3D11_CREATE_DEVICE_VIDEO_SUPPORT,
            levels, ARRAYSIZE(levels), D3D11_SDK_VERSION, &device, nullptr, nullptr));
#else
        handle_result(D3D11CreateDevice(nullptr, D3D_DRIVER_TYPE_NULL, nullptr, D3D11_CREATE_DEVICE_SINGLETHREADED,
            levels, ARRAYSIZE(levels), D3D11_SDK_VERSION, &device, nullptr, nullptr));
#endif

        UINT token;
        CComPtr<IMFDXGIDeviceManager> manager;

        handle_result(MFCreateDXGIDeviceManager(&token, &manager));
        handle_result(manager->ResetDevice(device, token));

        handle_result(attributes->SetUnknown(MF_SOURCE_READER_D3D_MANAGER, manager));
        handle_result(attributes->SetUINT32(MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS, TRUE));
        handle_result(attributes->SetUINT32(MF_SOURCE_READER_ENABLE_ADVANCED_VIDEO_PROCESSING, TRUE));
#else
        handle_result(attributes->SetUINT32(MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING, TRUE));
#endif

        CComPtr<IMFSourceReader> reader;

        handle_result(MFCreateSourceReaderFromURL(L"Rogue One - A Star Wars Story - Trailer.mp4", attributes, &reader));

        CComPtr<IMFMediaType> output_type;

        handle_result(MFCreateMediaType(&output_type));
        handle_result(output_type->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video));
        handle_result(output_type->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_RGB32));
        handle_result(reader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_VIDEO_STREAM, nullptr, output_type));

        unsigned int frame_count{};

        std::cout << "Started processing frames" << std::endl;

        while (true)
        {
            CComPtr<IMFSample> sample;
            DWORD flags;

            handle_result(reader->ReadSample(MF_SOURCE_READER_FIRST_VIDEO_STREAM,
                0, nullptr, &flags, nullptr, &sample));

            if (flags & MF_SOURCE_READERF_ENDOFSTREAM || sample == nullptr)
                break;

            std::cout << "Frame " << frame_count++ << std::endl;

            CComPtr<IMFMediaBuffer> buffer;
            BYTE* data;

            handle_result(sample->ConvertToContiguousBuffer(&buffer));
            handle_result(buffer->Lock(&data, nullptr, nullptr));

            // Use the frame here.

            buffer->Unlock();
        }

        std::cout << "Finished processing frames" << std::endl;
    }

    MFShutdown();
    CoUninitialize();

    return 0;
}

Your code is correct, conceptually, with the only remark - and it's not quite obvious - that Media Foundation decoder is multithreaded.您的代码在概念上是正确的,唯一的评论 - 而且不是很明显 - Media Foundation 解码器是多线程的。 You are feeding it with a single threaded version of Direct3D device.您正在使用 Direct3D 设备的单线程版本提供它。 You have to work it around or you get what you are currently getting: access violations and freezes, that is undefined behavior.您必须解决它,否则您会得到当前得到的结果:访问冲突和冻结,这是未定义的行为。

    // NOTE: No single threading
    handle_result(D3D11CreateDevice(nullptr, D3D_DRIVER_TYPE_HARDWARE, nullptr, 
        (0 * D3D11_CREATE_DEVICE_SINGLETHREADED) | D3D11_CREATE_DEVICE_VIDEO_SUPPORT,
        levels, ARRAYSIZE(levels), D3D11_SDK_VERSION, &device, nullptr, nullptr));

    // NOTE: Getting ready for multi-threaded operation
    const CComQIPtr<ID3D11Multithread> pMultithread = device;
    pMultithread->SetMultithreadProtected(TRUE);

Also note that this straightforward code sample has a performance bottleneck around the lines you added for getting contiguous buffer.另请注意,这个简单的代码示例在您为获取连续缓冲区而添加的行周围存在性能瓶颈。 Apparently it's your move to get access to the data... however behavior by design is that decoded data is already in video memory, and your transfer to system memory is an expensive operation.显然,这是您访问数据的举动……但是设计行为是解码数据已经在视频内存中,并且您传输到系统内存是一项昂贵的操作。 That is, you added a severe performance hit to the loop.也就是说,您向循环添加了严重的性能损失。 You will be interested in checking validity of data this way, and when it comes to performance benchmarking you should rather comment that out.您将对以这种方式检查数据的有效性感兴趣,而当涉及到性能基准测试时,您应该将其注释掉。

The output types of H264 video decoder can be found here: https://msdn.microsoft.com/en-us/library/windows/desktop/dd797815(v=vs.85).aspx . H264 视频解码器的输出类型可以在这里找到: https : //msdn.microsoft.com/en-us/library/windows/desktop/dd797815(v= vs.85).aspx。 RGB32 is not one of them. RGB32 不是其中之一。 In this case your app relies on the Video Processor MFT to do the conversion from any of the MFVideoFormat_I420, MFVideoFormat_IYUV, MFVideoFormat_NV12, MFVideoFormat_YUY2, MFVideoFormat_YV12 to RGB32.在这种情况下,您的应用程序依赖视频处理器 MFT 将 MFVideoFormat_I420、MFVideoFormat_IYUV、MFVideoFormat_NV12、MFVideoFormat_YUY2、MFVideoFormat_YV12 中的任何一个转换为 RGB32。 I suppose that it's the Video Processor MFT that acts strangely and causes your program to misbehave.我想是视频处理器 MFT 行为异常并导致您的程序行为不端。 That's why by setting NV12 as the output subtype for the decoder you'll get rid of the Video Processor MFT and the following lines of code are getting useless as well:这就是为什么通过将 NV12 设置为解码器的输出子类型,您将摆脱视频处理器 MFT 并且以下代码行也变得无用:

handle_result(attributes->SetUINT32(MF_SOURCE_READER_ENABLE_ADVANCED_VIDEO_PROCESSING, TRUE));

and

handle_result(attributes->SetUINT32(MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING, TRUE));

Moreover, as you noticed NV12 is the only format that works properly.此外,正如您所注意到的,NV12 是唯一可以正常工作的格式。 I think the reason for this is that it is the only one that is used in the accelerated scenarios by the D3D and DXGI device manager.我认为这样做的原因是它是 D3D 和 DXGI 设备管理器在加速场景中唯一使用的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM