简体繁体 English

使用MediaCapture直播

[英]Live streaming using MediaCapture

原文 2017-06-09 10:50:15 9 1 c#/ c++/ uwp/ ms-media-foundation

For a project I am trying to create a video calling app. 对于一个项目，我正在尝试创建一个视频通话应用程序。 I'm doing this for the Universal Windows Platform, so I figured I could use the MediaCapture class. 我正在为通用Windows平台做这个，所以我想我可以使用MediaCapture类。 This class does have the StartRecordToCustomSinkAsync() method, but in order to use it, I need to create a custom sink. 这个类确实有StartRecordToCustomSinkAsync()方法，但为了使用它，我需要创建一个自定义接收器。 I started creating one, but now I am stuck on creating the stream sinks. 我开始创建一个，但现在我不得不创建流水槽。 This link explains Media Sinks, but I can't find any documentation on stream sinks. 此链接解释了Media Sinks，但我找不到关于流接收器的任何文档。

I've also looked into the Simple Communication , WavSink and another custom sink example , but the code either lacks comments or solves another problem. 我还研究了Simple Communication ， WavSink和另一个自定义接收器示例，但代码缺少注释或解决了另一个问题。

Does anyone know how I can implement an UWP video calling application, or point me in the right direction? 有谁知道我如何实施UWP视频通话应用程序，或指出我正确的方向？

Ps. PS。 I know this kind of question is asked more, but there are no usable answers available :( 我知道这类问题更多，但没有可用的答案:(

1 个解决方案

Your specific implementation 您的具体实施

The most important first step to implementing your own IMFMediaSink and IMFStreamSink classes is figuring out what you actually want to do with the IMFSample instances you'll be receiving. 实现您自己的IMFMediaSink和IMFStreamSink类的最重要的第一步是弄清楚您实际想要对您将要接收的IMFSample实例做什么。 Once you know what kind of sink you are trying to make, then consider where you're getting the samples from, whether it's the UWP MediaCapture class, an IMFSinkWriter , an IMFTopology , or even just direct calls to your sink from client code. 一旦你知道你想要制作什么样的接收器，那么考虑你从哪里获取样本，无论是UWP MediaCapture类， IMFSinkWriter ， IMFTopology ，还是从客户端代码直接调用接收器。

If your goal is to have a two-way audio/video chat application, each instance of your app will need to capture audio and video, encode it in a compressed format, and send those samples to the other instance. 如果您的目标是使用双向音频/视频聊天应用程序，则应用的每个实例都需要捕获音频和视频，以压缩格式对其进行编码，然后将这些样本发送到另一个实例。 MediaCapture will take care of almost all of that, except the network transmission part. 除网络传输部分外， MediaCapture将负责几乎所有这些工作。 Consequently, the other instance must be able to decode those audio and video samples, and present them to the user. 因此，另一个实例必须能够解码那些音频和视频样本，并将它们呈现给用户。 Thus, you'll need a custom IMFMediaSink implementation, which receives IMFSample instances from MediaCapture , then transmits them over the network. 因此，你需要一个定制IMFMediaSink实现，它接收IMFSample从实例MediaCapture ，然后通过网络发送。 You'll also need an IMFMediaSource implementation that receives samples from a network source and sends them to a presenter for rendering to the user. 您还需要一个IMFMediaSource实现，它接收来自网络源的样本并将它们发送给演示者以呈现给用户。 The presenter in this case would be an instance of MediaCapture . 在这种情况下，演示者将是MediaCapture一个实例。 Each app instance will have both of the custom sources and sinks operating at the same time - one is always capturing audio and video and sending it across the network, and the other is receiving that data from the other app instance and rendering it. 每个应用实例都将同时运行两个自定义源和接收器 - 一个始终捕获音频和视频并通过网络发送，另一个从另一个应用实例接收并呈现它。

One way to do this is to provide a socket interface to your transmitting media sinks and your receiving media sources. 一种方法是为传输媒体接收器和接收媒体源提供套接字接口。 You'll need to figure out a way to handle network interruptions and cleanly resume data transmission and display on both sides. 您需要找到一种方法来处理网络中断并干净地恢复数据传输和双方显示。 One of the easiest ways to handle this is to have a lightweight messaging layer on top of the actual video and audio sample transmission, so that you can signify a start of stream message from the sender to the receiver, which will allow the receiver to dump any queued samples and re-start its presentation clock with the first new sample that arrives, and then maintain coherent display after that. 处理此问题的最简单方法之一是在实际的视频和音频样本传输之上设置轻量级消息传递层，以便您可以表示从发送方到接收方的流消息的开始，这将允许接收方转储任何排队的样本，并使用第一个到达的新样本重新启动其显示时钟，然后在此之后保持一致的显示。 Again, the MediaCapture class will probably handle a lot of that logic for you, but you have to be careful that your sources and sinks are acting like MediaCapture expects them to. 同样， MediaCapture类可能会为您处理很多逻辑，但您必须小心，您的源和接收器的行为与MediaCapture期望的一样。

You'll also need to establish a scheme to buffer data on each side so that small network latency doesn't cause stuttering in the audio and video. 您还需要建立一个方案来缓冲每侧的数据，以便小的网络延迟不会导致音频和视频中的卡顿。 If you're playing video at 30 fps you need to display a new frame every 33 milliseconds, so that means each app instance needs to have enough data buffered to guarantee the presenter can show a frame at that interval. 如果您以30 fps播放视频，则需要每33毫秒显示一个新帧，这意味着每个应用实例都需要缓冲足够的数据以保证演示者能够以该间隔显示帧。 Audio is much the same - audio samples have durations so you need to make sure you have enough sample data for continuous playback. 音频大致相同 - 音频样本具有持续时间，因此您需要确保有足够的样本数据用于连续播放。 However, you also don't want too much buffering because you'll get a "satellite TV" effect, where the psychology of human dialog causes large gaps between each person speaking, because they are waiting for the audio transmitted by your app to stop before they start talking, and the buffering latency will magnify those gaps. 但是，你也不需要太多的缓冲，因为你会得到一个“卫星电视”效果，人类对话的心理造成每个人说话之间的巨大差距，因为他们正在等待你的应用程序传输的音频停止在他们开始谈话之前，缓冲延迟会放大这些差距。 Microsoft has a general discussion of buffering here , and it's useful to consider even though it specifically pertains to ASF streaming. Microsoft 在这里对缓冲进行了一般性的讨论，即使它特别适用于ASF流，它也很有用。

The last implementation related suggestion I have is to look at existing network streaming formats Media Foundation can use. 我最后一个实现相关的建议是看看Media Foundation可以使用的现有网络流格式。 Although you'll still have to implement your own sources, media sinks, and stream sinks, by writing WinRT-enabled wrappers around existing Media Foundation implementations you can avoid worrying about the vast majority of low level details. 虽然您仍然需要实现自己的源，媒体接收器和流接收器，但通过围绕现有Media Foundation实现编写支持WinRT的包装器，您可以避免担心绝大多数低级别细节。 The most straightforward path I can see here is writing a wrapper class that implements IMFMediaSink , and holds an internal copy of a media sink created with MFCreateASFStreamingMediaSink . 我在这里看到的最简单的路径是编写一个实现IMFMediaSink的包装类，并保存使用MFCreateASFStreamingMediaSink创建的媒体接收器的内部副本。 You may also need to write a WinRT-enabled IMFByteSource implementation that you can provide to your media sink, which would then be passed to the ASF streaming sink. 您可能还需要编写一个启用WinRT的IMFByteSource实现，您可以将其提供给媒体接收器，然后将其传递到ASF流式接收器。 On the source side, you'd be writing an IMFMediaSource that reads from a network byte stream and wraps an ASF file media source. 在源端，您将编写一个IMFMediaSource ，它从网络字节流中读取并包装ASF文件媒体源。

General overview of `IMFMediaSink` and `IMFStreamSink` `IMFMediaSink`和`IMFStreamSink`概述

I can't give you code that will do what you want, primarily because implementing a custom media sink and set of stream sinks takes a sizable amount of work, and your needs are very specific in this case. 我无法为您提供能够满足您需求的代码，主要是因为实现自定义媒体接收器和一组流接收器需要大量工作，在这种情况下您的需求非常具体。 Instead, I hope I can help you better understand the roles of media sinks and stream sinks within Media Foundation. 相反，我希望我能帮助您更好地理解Media Foundation中媒体接收器和流接收器的作用。 I am far from a UWP expert, so I'll contribute as much as I can on the Media Foundation side. 我离UWP专家很远，所以我会尽可能多地在媒体基金会方面做出贡献。

From what I understand about UWP's MediaCapture class, custom sinks are responsible for interfacing with both the WinRT side and the COM / Media Foundation side, so you have to implement interfaces on both sides. 根据我对UWP的MediaCapture类的理解，自定义接收器负责与WinRT端和COM / Media Foundation端的接口，因此您必须在两端实现接口。 This is because MediaCapture is more or less a UWP wrapper over a lot of Media Foundation technology that is abstracted away. 这是因为MediaCapture是一个UWP包装器，而不是很多被抽象出来的Media Foundation技术。 The Simple Communication sample is actually extremely useful for this and has a lot of great starter code, particularly in the common/MediaExtensions section, which has a C++ implementation of a custom network-enabled media sink. Simple Communication示例实际上非常有用，并且有很多很棒的入门代码，特别是在common / MediaExtensions部分中，它具有自定义网络媒体接收器的C ++实现。 That implementation also shows you how to interface your custom sink with WinRT so that it's usable in a UWP environment. 该实现还向您展示了如何将自定义接收器与WinRT接口，以便它可以在UWP环境中使用。

Below is a general discussion on the function of media and stream sinks, the general ways they are used by client code, and the general ways they can be designed and implemented. 下面是关于媒体和流接收器功能的一般性讨论，客户端代码使用它们的一般方式，以及它们的设计和实现的一般方法。

Using an existing `IMFMediaSink` 使用现有的`IMFMediaSink`

In this section I'll go over how media sinks and stream sinks are used in practice. 在本节中，我将介绍如何在实践中使用媒体接收器和流接收器。 This will hopefully give some insights to the design decisions Microsoft made when architecting this part of Media Foundation, and help you understand how client code (even your own client code) will use your custom implementation of IMFMediaSink and IMFStreamSink . 这有望为Microsoft在构建Media Foundation的这一部分时所做的设计决策提供一些见解，并帮助您了解客户端代码（甚至您自己的客户端代码）将如何使用您的IMFMediaSink和IMFStreamSink自定义实现。

Media sinks that save data 媒体接收器可以保存数据

Let's look at how you'd use a specific media sink implementation within Media Foundation. 让我们来看看你如何在Media Foundation中使用特定的媒体接收器实现。 We'll start with media sink you receive when you call MFCreateMPEG4MediaSink . 我们将从您拨打MFCreateMPEG4MediaSink时收到的媒体接收器开始。 Keep in mind that all media sinks implement the IMFMediaSink interface, no matter what type of media they represent. 请记住，无论它们代表什么类型的媒体，所有媒体接收器都实现IMFMediaSink接口。 That function creates an IMFMediaSink that is built-in to Media Foundation, and is responsible for creating output with an MPEG4 container structure. 该功能创建了一个内置于Media Foundation的IMFMediaSink ，负责使用MPEG4容器结构创建输出。 When you create it, you have to provide an IMFByteStream into which the output MP4 data should be written. 创建它时，必须提供IMFByteStream ，输出MP4数据应写入其中。 The media sink is responsible for maintaining a collection of IMFStreamSink objects, one for each input stream to the media sink. 媒体接收器负责维护IMFStreamSink对象的集合， IMFStreamSink对象用于媒体接收器的每个输入流。 In the context of the MPEG4 media sink, there are only two possible output streams, so there exists a pre-defined number of stream sinks. 在MPEG4媒体接收器的上下文中，仅存在两个可能的输出流，因此存在预定数量的流接收器。 This is because the MPEG4 format only allows one video stream and/or one audio stream. 这是因为MPEG4格式仅允许一个视频流和/或一个音频流。 In this way, the MPEG4 media sink is enforcing a contract of sorts - it restricts what client code can do with it to respect the specifications of the media container it's writing. 通过这种方式，MPEG4媒体接收器强制执行各种合同 - 它限制客户端代码可以使用它来遵守它所编写的媒体容器的规范。 To access these sinks, you'd use IMFMediaSink::GetStreamSinkCount and IMFMediaSink::GetStreamSinkByIndex . 要访问这些接收器，您可以使用IMFMediaSink::GetStreamSinkCount和IMFMediaSink::GetStreamSinkByIndex 。 If you have audio and video, there will be 2 sinks total with index 0 for the Video major type and index 1 for the Audio major type. 如果您有音频和视频，则总共有2个接收器，其中索引0用于视频主要类型，索引1用于音频主要类型。

Other types of media sinks, like the one you get when calling MFCreateASFMediaSink , allow multiple audio and video streams, and require that you call IMFMediaSink::AddStreamSink for each stream you intend to write into the media sink. 其他类型的媒体接收器，例如您在调用MFCreateASFMediaSink时获得的MFCreateASFMediaSink ，允许多个音频和视频流，并要求您为要写入媒体接收器的每个流调用IMFMediaSink::AddStreamSink 。 Looking at the documentation, you can see that you must provide a stream sink identifier (a unique DWORD or int that you want to use to reference that stream within the media sink) and an IMFMediaType that informs the media sink what kind of data you'll be sending that stream sink. 查看文档，您可以看到必须提供流接收器标识符（您要用于在介质接收器中引用该流的唯一DWORD或int ）和IMFMediaType ，以通知介质接收器您使用的是哪种数据我将发送该流水槽。 In return, you receive an IMFStreamSink , and you use that stream sink to write the data for that stream. 作为回报，您将收到IMFStreamSink ，并使用该流接收器为该流写入数据。

As an aside, you can determine whether an arbitrary IMFMediaSink supports fixed or variable numbers of streams via IMFMediaSink::GetCharacteristics and checking the output flags for MEDIASINK_FIXED_STREAMS . 顺便说一句，你可以确定任意是否IMFMediaSink通过支持流的固定或可变数字IMFMediaSink::GetCharacteristics和检查输出标志MEDIASINK_FIXED_STREAMS 。 In addition, most media sinks that store data to a byte stream will also have a flag called MEDIASINK_RATELESS which means it will consume samples as fast as you can send them, because they are all being written to a byte stream and there's no reason to wait or sync that process to a presentation clock. 此外，大多数将数据存储到字节流的媒体接收器也会有一个名为MEDIASINK_RATELESS的标志，这意味着它将尽可能快地使用样本，因为它们都被写入字节流并且没有理由等待或者将该过程同步到演示时钟。 There will be more discussion of this in the next section. 下一节将对此进行更多讨论。

Here's a simple worked out example, step by step. 这是一个简单的实例，一步一步。 If you have an H.264 video stream and an AAC audio stream, each of them being a series of IMFSample instances, perhaps you want to save them to a file on the hard drive. 如果您有H.264视频流和AAC音频流，每个都是一系列IMFSample实例，您可能希望将它们保存到硬盘驱动器上的文件中。 Let's say you want to use the ASF container format for this. 假设您想要使用ASF容器格式。 You'd create an IMFByteStream that uses a file as its storage method, and then create an ASF media sink with MFCreateASFMediaSink . 您将创建一个使用文件作为其存储方法的IMFByteStream ，然后使用MFCreateASFMediaSink创建一个ASF媒体接收器。 Then, you'd call IMFMediaSink::AddStreamSink to add the H.264 video stream. 然后，您将调用IMFMediaSink::AddStreamSink来添加H.264视频流。 You'd pass in a unique identifier for that stream, like 0, and the IMFMediaType that specifies things like the the media's major type (video), the media's subtype (H.264), the frame size and frame rate, and so forth. 您将传入该流的唯一标识符，如0，以及指定媒体主要类型（视频），媒体子类型（H.264），帧大小和帧速率等内容的IMFMediaType ，等等。 You would receive an IMFStreamSink back, and you can use that specific stream sink and its IMFStreamSink::ProcessSample method to send all your H.264 encoded video samples. 您将收到一个IMFStreamSink ，您可以使用该特定的流接收器及其IMFStreamSink::ProcessSample方法发送所有H.264编码的视频样本。 Then, you'd call AddStreamSink again for the AAC audio stream, and again you'd receive back an IMFStreamSink you can use to send all the AAC encoded audio samples. 然后，您再次为AAC音频流调用AddStreamSink ，并再次收到IMFStreamSink您可以使用它发送所有AAC编码的音频样本。 The stream sinks also know who their parent IMFMediaSink is, and each stream sink works with the media sink to package the data into a single media stream, the output of which is in the IMFByteStream you created earlier. 流接收器也知道他们的父IMFMediaSink是谁，并且每个流接收器与媒体接收器一起将数据打包成单个媒体流，其输出在您之前创建的IMFByteStream 。 Once you have all your stream sinks set up, you'd repeatedly call IMFStreamSink::ProcessSample on each one, providing each IMFSample of the appropriate type (ie video or audio) from the source streams. 设置好所有流接收器后，您将在每个接收器上重复调用IMFStreamSink::ProcessSample ，从IMFSample中提供相应类型（即视频或音频）的每个IMFSample 。

Media sinks that present data 媒体汇集了数据

Present, in this case, means "render the data as it arrives." 在这种情况下，存在意味着“在数据到达时呈现数据”。 An example of this would be the Enhanced Video Renderer (EVR), which implements the IMFMediaSink interface. 这方面的一个例子是增强型视频渲染器（EVR），它实现了IMFMediaSink接口。 The EVR is responsible for receiving IMFSample instances via its stream sinks, which you access via IMFMediaSink::AddStreamSink or IMFMediaSink::GetStreamSinkByIndex just like I discussed above. EVR负责通过其流接收器接收IMFSample实例，您可以通过IMFMediaSink::AddStreamSink或IMFMediaSink::GetStreamSinkByIndex它，就像我上面讨论的那样。 If you query the EVR media sink using IMFMediaSink::GetCharacteristics you'll see it does not provide the MEDIASINK_RATELESS flag. 如果使用IMFMediaSink::GetCharacteristics查询EVR媒体接收器，您将看到它不提供MEDIASINK_RATELESS标志。 This is because the EVR, being a video presenter, is supposed to display the video samples it receives to the screen, and it should do so in a way that each IMFSample has its start time and duration respected. 这是因为作为视频演示者的EVR应该将它接收的视频样本显示在屏幕上，并且它应该以每个IMFSample的开始时间和持续时间都得到尊重的方式进行。 This means the EVR requires a presentation clock, via the IMFPresentationClock interface, so that it can display the video samples with the correct framerate and synchronized to the passage of time. 这意味着EVR需要通过IMFPresentationClock接口的演示时钟，以便它能够以正确的帧速率显示视频样本并与时间的推移同步。 Remember that media sinks whose job is to simply store data don't have to worry about this, because there's no need to store data at some steady or consistent rate. 请记住，其工作只是存储数据的媒体接收器不必担心这一点，因为不需要以某种稳定或一致的速率存储数据。

There is an audio renderer similar to the EVR called the Streaming Audio Renderer (SAR) and it works in a similar way. 有一个类似于EVR的音频渲染器，称为流式音频渲染器（SAR），它以类似的方式工作。 As a thought experiment, consider the basic architecture if you wrote your own media sink that wrapped both the EVR and SAR, so you could connect video and audio streams to your custom sink, and the custom stream sink implementations would feed the examples to the appropriate renderer based on the media type of each stream sink. 作为一个思想实验，如果您编写了自己的包含EVR和SAR的媒体接收器，请考虑基本架构，这样您就可以将视频和音频流连接到自定义接收器，并且自定义流接收器实现会将示例提供给相应的渲染器基于每个流接收器的媒体类型。 That way you could just create one media sink instead of worrying about the EVR and SAR as separate media sinks. 这样你就可以创建一个媒体接收器而不是担心EVR和SAR作为单独的媒体接收器。

Implementing your own `IMFMediaSink` and `IMFStreamSink` 实现自己的`IMFMediaSink`和`IMFStreamSink`

At this point, I hope you can see the implementation of IMFStreamSink is entirely dependent on the implementation of IMFMediaSink - in other words, an MPEG4 media sink is going to need its own specific implementation of IMFStreamSink , because those stream sinks need to package the raw media data inside MPEG4 containers, need to build an index of samples and timestamps to embed in the MPEG4 file, and so forth. 此时，我希望你能看到IMFStreamSink的实现完全依赖于IMFMediaSink的实现 - 换句话说，MPEG4媒体接收器需要自己特定的IMFStreamSink实现，因为那些流接收器需要打包原始MPEG4容器内的媒体数据，需要建立一个样本索引和嵌入MPEG4文件的时间戳，等等。

Writing your own version of the MPEG4 sink 编写自己的MPEG4接收器版本

Given what has been presented above, consider how you would structure the actual classes that hide behind those interfaces if you were writing your own version of an MPEG4 sink. 鉴于上面已经介绍过的内容，如果您正在编写自己的MPEG4接收器版本，请考虑如何构建隐藏在这些接口后面的实际类。 You'd need an implementation of IMFMediaSink , a way to create an instance of that implementation, and then you'd need to have one or more implementations of IMFStreamSink that are directly dependent on your implementation of IMFMediaSink . 您需要IMFMediaSink的实现， IMFMediaSink是一种创建该实现的实例的方法，然后您需要有一个或多个IMFStreamSink实现， IMFStreamSink实现直接依赖于您的IMFMediaSink实现。

Here's one way to structure your implementation. 这是构建实现的一种方法。 You'd start with a class called CMPEG4MediaSinkImpl which implements IMFMediaSink , and perhaps you'd have a static C-style function to create instances of that class, just like MFCreateMPEG4MediaSink . 您将从一个名为CMPEG4MediaSinkImpl的类开始，它实现了IMFMediaSink ，并且您可能有一个静态C风格的函数来创建该类的实例，就像MFCreateMPEG4MediaSink一样。 You only support one video and one audio stream, and the media types for each are provided to the non-public constructor of your CMPEG4MediaSinkImpl class. 您只支持一个视频和一个音频流，并且每个媒体类型的媒体类型都提供给CMPEG4MediaSinkImpl类的非公共构造函数。 Let's say you wanted to have different stream sinks based on the media's major type, ie audio or video, perhaps because you want separate counts of how many audio and video samples you received. 假设您希望根据媒体的主要类型（即音频或视频）设置不同的流接收器，可能是因为您需要单独计算您收到的音频和视频样本数量。 The constructor would check each of the media types, and for the video stream it would create an instance of CMPEG4StreamSinkVideoImpl , and for audio it would create an instance of CMPEG4StreamSinkAudioImpl . 构造函数将检查每种媒体类型，对于视频流，它将创建CMPEG4StreamSinkVideoImpl的实例，对于音频，它将创建CMPEG4StreamSinkAudioImpl的实例。 Each of those class implementations would implement IMFStreamSink and would be responsible for receiving the individual IMFSample instances in their IMFStreamSink::ProcessSample implementations. 每个类实现都将实现IMFStreamSink并负责在其IMFStreamSink::ProcessSample实现中接收各个IMFSample实例。 Perhaps in those implementations, each stream sink would call CMPEG4MediaSinkImpl::WriteSampleBlock that would create the appropriate MPEG4 data structure to wrap the sample data, as well as document the sample time and duration in an MPEG4 structure, and then embed that in the output IMFByteStream . 也许在这些实现中，每个流接收器将调用CMPEG4MediaSinkImpl::WriteSampleBlock ，它将创建适当的MPEG4数据结构来包装样本数据，以及在MPEG4结构中记录采样时间和持续时间，然后将其嵌入到输出IMFByteStream 。