简体繁体 English

什么是可靠多播的最有效协议？

[英]What's the most efficient protocol for reliable multicast?

原文 2009-04-19 00:07:12 4 7 protocols/ file-transfer/ multicast/ ethernet/ reliable-multicast

When a sender needs to multicast a relatively large volume of data (say several megabytes per second) in a reliable way over Ethernet to a modest number of receivers (say less than a dozen) on the same subnet, what is the most efficient protocol? 当发送者需要通过以太网以可靠的方式将相对大量的数据（比如几兆字节/秒）多播到同一子网上的适度数量的接收器（比如少于十几个）时，最有效的协议是什么？ By reliable I mean that if a packet is lost, the protocol ensures that it gets resent such that there is no data loss in any receiver. 通过可靠，我的意思是，如果数据包丢失，协议确保重新发送，以便任何接收器都不会丢失数据。 The term efficient is a lot harder to define, but let's say we want to maximize throughput and minimize network bandwidth with modest CPU usage on both ends. “ 高效”这个术语要定义起来要困难得多，但是假设我们希望通过两端适度的CPU使用来最大化吞吐量并最小化网络带宽。 That's still not a clear-cut definition but it's the best I can come up with. 这仍然不是一个明确的定义，但它是我能想到的最好的。 Either a stream-oriented or a message-oriented protocol would be acceptable. 面向流或面向消息的协议都是可以接受的。

I'd appreciate real-world examples and I'll gladly accept subjective answers, ie what's your favorite multicast protocol, if you can explain its pros and cons. 我很欣赏现实世界的例子，我很乐意接受主观的答案，即你最喜欢的多播协议是什么，如果你能解释它的优点和缺点。

7 个解决方案

Real-world example: TIBCO Rendezvous. 真实世界的例子：TIBCO Rendezvous。

Data is sent out via multicast with a sequence number. 数据通过多播以序列号发送出去。 A client that detects a missing sequence number sends out a messge on the multicast group "hey, I missed packet 12345". 检测到丢失序列号的客户端在组播组上发出一个消息“嘿，我错过了数据包12345”。 The server re-multicasts out that data. 服务器重新编组该数据。 The server has a configurable amount of data to buffer in case a client requests it. 服务器具有可配置的数据量，以便在客户端请求时进行缓冲。

The problem: 问题：

Imagine having a single client that drops half of his packets, and 100 healthy clients. 想象一下，有一个客户端可以丢弃一半的数据包，还有100个健康的客户端。 This client sends a retransmission request for every other packet. 该客户端为每个其他数据包发送重传请求。 The server begins to cause enough load on one of the healthy clients such that it starts dropping packets and requesting retransmissions. 服务器开始在其中一个正常客户端上引起足够的负载，使其开始丢弃数据包并请求重新传输。 The extra load from that causes another healthy client to begin requesting retransmissions. 来自它的额外负载导致另一个健康的客户端开始请求重新传输。 And so on. 等等。 A congestion collapse results. 拥堵崩溃的结果。

Tibco provides a workaround, of cutting off a subscriber that sends too many retransmission requests. Tibco提供了一种解决方法，即切断发送过多重传请求的订户。 This makes it harder for a single subscriber to cause a congestion collapse. 这使得单个订户更难以导致拥塞崩溃。

The other workaround to limit the risk of congestion collapse is to limit the amount of data that a server is willing to retransmit. 限制拥塞崩溃风险的另一种解决方法是限制服务器愿意重新传输的数据量。

Tibco should also provide heuristics in the client and server as to whether to multicast or unicast the retransmission request, and the retransmission itself. Tibco还应在客户端和服务器中提供关于是否多播或单播重传请求以及重传本身的启发式方法。 They don't. 他们没有。 (For the server, you could unicast the retransmission if only one client requested it in a certain time window, for the client you could unicast the retransmission request if the server has told you - in the retransmitted packet - that you are the only one requesting retransmissions and to please unicast the requests in the future) （对于服务器，如果只有一个客户端在特定时间窗口内请求重传，则可以单播重传;对于客户端，如果服务器告知您 - 在重传的数据包中 - 您是唯一一个请求，则可以单播重传请求转发并请将来的请求单播出来）

Fundamentally you will have to decide between how strongly you want to guarantee that clients receive data vs the risk of congestion collapse. 从根本上说，您必须决定您希望保证客户接收数据的强度与拥塞崩溃的风险之间的关系。 You will have to make guesses as to where a packet was dropped and whether the retransmission is most efficiently sent unicast or multicast. 您将不得不猜测丢弃数据包的位置以及重传是否最有效地发送单播或多播。 If the server understands the data and can decide to not send a retransmission if there is updated data to be sent anyway (that makes the retransmission irrelevant), you are in a much better position than a framework such as Tibco RV. 如果服务器理解数据并且可以决定不发送重传，如果有更新的数据要发送（这使得重传不相关），那么你比Tibco RV这样的框架处于更好的位置。

Sometimes understanding the data can lead to wrong assumptions. 有时理解数据可能会导致错误的假设。 For example, market data - it may seem at first OK to not retransmit a quote when there is an updated quote. 例如，市场数据 - 当有更新的报价时，首先看起来可能不会重新发送报价。 But later, you may find that a subscriber was keeping a quote history, not just trying to keep track of the current quote. 但是稍后，您可能会发现订阅者保留了报价历史记录，而不仅仅是尝试跟踪当前报价。 Perhaps you may have different requirements depending on the subscriber, and some clients will prefer unicast TCP vs multicast. 根据订户的不同，您可能会有不同的要求，有些客户端更喜欢单播TCP与多播。

At some point you will need to make arbitrary decisions on the server of how much data to buffer in case of retransmissions or slow clients. 在某些时候，您需要在服务器上做出任意决定，以便在重新传输或缓慢客户端时缓冲多少数据。

Following on from TIBCO, the PGM protocol is an open standard reliable multicast with many optimisations to efficiently work at very large scales with network element acceleration. 继TIBCO之后，PGM协议是一种开放式标准可靠多播，具有许多优化功能，可以在网络元素加速的情况下有效地进行大规模工作。 PGM was developed by TIBCO and CISCO and is an optional protocol underneath TIBCO Rendezvous, the default protocol being TRDP which is very similar in design. PGM由TIBCO和CISCO开发，是TIBCO Rendezvous下的可选协议，默认协议是TRDP，其设计非常相似。

You can calculate theoretical efficiencies such as listed here for PGM, 您可以计算理论效率，例如此处列出的PGM，

http://code.google.com/p/openpgm/wiki/PgmPerformance http://code.google.com/p/openpgm/wiki/PgmPerformance

Unfortunately real world network elements, NICs and general computer architectures perform a lot less than the theoretical maximums. 不幸的是，现实世界的网络元素，NIC和通用计算机体系结构的执行远低于理论最大值。

http://www.jgroups.org/

Might I suggest UFTP . 可能我建议使用UFTP 。 It uses a NAK based mechanism to determine which packets to retransmit and has an option for either a fixed transmission rate or congestion control using TFMCC . 它使用基于NAK的机制来确定要重传的数据包，并且可以使用TFMCC选择固定传输速率或拥塞控制。

Each file is sent in passes, where the first pass transmits the entire file, while subsequent passes only send retransmissions. 每个文件都以传递方式发送，第一遍传输整个文件，而后续传递仅发送重传。 Each client keeps track of which packets it received and which ones it missed. 每个客户端都会跟踪它收到的数据包和错过的数据包。 At particular checkpoints (and at the end of a pass), if the receiver missed any packets since the last checkpoint it will send a NAK listing the packets that were missed. 在特定检查点（以及通过结束时），如果接收方错过了自上一个检查点以来的任何数据包，它将发送NAK列出丢失的数据包。 This has the advantage that low-loss receivers will finish before high-loss receivers. 这具有以下优点：低损耗接收器将在高损耗接收器之前完成。 UFTP can also be configured to drop receivers whose percentage of NAKs exceeds a certain threshold. UFTP还可以配置为丢弃NAK百分比超过特定阈值的接收器。

By limiting NAKs to only receivers that have exhibited loss it reduces the risk of congestion collapse, which is the sender getting overwhelmed by receiver feedback. 通过将NAK限制为仅显示丢失的接收器，它降低了拥塞崩溃的风险，这是发送者被接收器反馈所淹没。

Disclosure: author of UFTP. 披露：UFTP的作者。

BitTorrent! BitTorrent的！

No, seriously. 不，真的。 You might want to read up on it . 您可能想要阅读它。

UDP is useful for multicast but it doesn't provide the guarantees you're looking for - BitTorrent will require you to transmit more than one full copy from the original source, but it's still fairly efficient and provides useful guarantees, especially considering how much checksumming is done on each "chunk" of data passed along. UDP对于多播非常有用，但它不能提供您正在寻找的保证--BitTorrent将要求您从原始源传输多个完整副本，但它仍然相当有效并提供有用的保证，特别是考虑到校验和的数量在传递的每个“数据块”上完成。

我想你应该看一下流控制传输协议作为UDP /多播的替代方案，如果你真的想要可靠的同时传输到多个客户端。

This is an open research question; 这是一个开放的研究问题; there are commercial solutions available but which are prohibitively expensive. 有商业解决方案，但是价格过高。 Good luck. 祝好运。