简体   繁体   English

如何调试丢包?

[英]How to debug packet loss?

I wrote a C++ application (running on Linux) that serves an RTP stream of about 400 kbps. 我写了一个C ++应用程序(在Linux上运行),它提供大约400 kbps的RTP流。 To most destinations this works fine, but some destinations expericence packet loss. 对于大多数目的地,这工作正常,但一些目的地经验包丢失。 The problematic destinations seem to have a slower connection in common, but it should be plenty fast enough for the stream I'm sending. 有问题的目的地似乎有一个较慢的连接,但它应该足够快到我发送的流。

Since these destinations are able to receive similar RTP streams for other applications without packet loss, my application might be at fault. 由于这些目的地能够为其他应用程序接收类似的RTP流而不丢包,我的应用程序可能有问题。

I already verified a few things: - in a tcpdump, I see all RTP packets going out on the sending machine - there is a UDP send buffer in place (I tried sizes between 64KB and 300KB) - the RTP packets mostly stay below 1400 bytes to avoid fragmentation 我已经验证了一些事情: - 在tcpdump中,我看到所有RTP数据包在发送机器上传出 - 有一个UDP发送缓冲区到位(我尝试了64KB到300KB之间的大小) - RTP数据包大多数保持在1400字节以下避免分裂

What can a sending application do to minimize the possibility of packet loss and what would be the best way to debug such a situation ? 发送应用程序可以做些什么来最小化丢包的可能性以及调试这种情况的最佳方法是什么?

Don't send out packets in big bursty chunks. 不要发送大块突发的数据包。

The packet loss is usually caused by slow routers with limited packet buffer sizes. 数据包丢失通常是由数据包缓冲区大小有限的慢速路由器引起的。 The slow router might be able to handle 1 Mbps just fine if it has time to send out say, 10 packets before receiving another 10, but if the 100 Mbps sender side sends it a big chunk of 50 packets it has no choice but to drop 40 of them. 慢速路由器可能能够处理1 Mbps就好了如果它有时间发送出10个数据包然后再接收10个数据包,但如果100 Mbps发送方向它发送了大量50个数据包它就别无选择但只能掉线其中40个。

Try spreading out the sending so that you write only what is necessary to write in each time period. 尝试展开发送,以便您只写出每个时间段内写入的内容。 If you have to write one packet every fifth of a second, do it that way instead of writing 5 packets per second. 如果你必须每五分钟写一个数据包,那就这样做,而不是每秒写5个数据包。

netstat has several usefull option to debug the situation. netstat有几个有用的选项来调试情况。

First one is netstat -su (dump UDP statistics): 第一个是netstat -su(转储UDP统计信息):

dima@linux-z8mw:/media> netstat -su                                                      
IcmpMsg:                                                                                 
    InType3: 679
    InType4: 20
    InType11: 548
    OutType3: 100
Udp:
    12945 packets received
    88 packets to unknown port received.
    0 packet receive errors
    13139 packets sent
    RcvbufErrors: 0
    SndbufErrors: 0
UdpLite:
    InDatagrams: 0
    NoPorts: 0
    InErrors: 0
    OutDatagrams: 0
    RcvbufErrors: 0
    SndbufErrors: 0
IpExt:
    InNoRoutes: 0
    InTruncatedPkts: 0
    InMcastPkts: 3877
    OutMcastPkts: 3881
    InBcastPkts: 0
    OutBcastPkts: 0
    InOctets: 7172779304
    OutOctets: 785498393
    InMcastOctets: 525749
    OutMcastOctets: 525909
    InBcastOctets: 0
    OutBcastOctets: 0

Notice "RcvbufErrors" and "SndbufErrors" 注意“RcvbufErrors”和“SndbufErrors”

Additional option is to monitor receive and send UDP buffers of the process: 附加选项是监视进程的UDP缓冲区的接收和发送:

dima@linux-z8mw:/media> netstat -ua
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
udp        0      0 *:bootpc                *:*
udp        0      0 *:40134                 *:*
udp        0      0 *:737                   *:*
udp        0      0 *:mdns                  *:*

Here you need to look at Recv-Q and Send-Q column of the connection you're interested. 在这里,您需要查看您感兴趣的连接的Recv-Q和Send-Q列。 If the values high and don't drop to zero, than the process can not handle the load. 如果值高且不降为零,则过程无法处理负载。

You can use these commands on sending and on receiving machine. 您可以在发送和接收计算机上使用这些命令。

Also you can use mtr , which combines traceroute and ping - it pings each hop in route. 你也可以使用mtr ,它结合了traceroute和ping - 它在路由中ping每一跳。 This may detect a slow hop in your route. 这可能会检测到您路线中的慢跳。 Run it on oth machines to check connectivity to the second one. 在其他机器上运行它以检查与第二台机器的连接。

RTP typically uses UDP , which is inherently lossy. RTP通常使用UDP ,这本身就是有损的。 Packets could be lost anywhere between sender and receiver, so local debug will show you nothing useful. 数据包可能在发送方和接收方之间丢失,因此本地调试将向您显示无用的内容。

Obvious things to do: 明显的事情要做:

  • a: Reduce the overall data rate a:降低整体数据速率
  • b: Reduce the 'peak' data rate, by sending small packets more often rather than one huge chunk every few seconds. b:通过更频繁地发送小数据包而不是每隔几秒发送一个大块来降低“峰值”数据速率。 ie, REDUCE your UDP send buffer - maybe even to just 1400 bytes. 即,REDUCE你的UDP发送缓冲区 - 甚至可能只有1400字节。
  • c: See if you can switch to a TCP variant of RTP. c:查看是否可以切换到RTP的TCP变体。

If all else fails, WireShark is your friend. 如果一切都失败了, WireShark就是你的朋友。 It will give you a true picture of how much data - and when is being sent by your app. 它将为您提供有关数据量的真实情况 - 以及您的应用程序何时发送数据。

You should try reducing the rate you send packets. 您应该尝试降低发送数据包的速率。 A slow connection can mean all sorts of things, and trying to send it packets (small or large) at a high rate won't help. 缓慢的连接可能意味着各种各样的事情,并试图以高速率发送数据包(小或大)无济于事。

这可能不是你想要的答案,但是如果我遇到丢包问题,我会尝试将我的应用程序切换为使用TCP,并且最让我担心数据包丢失的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM