简体   繁体   English

UDP内核丢弃UDP数据包

[英]UDP packet drops by linux kernel

I have a server which sends UDP packets via multicast and a number of clients which are listing to those multicast packets. 我有一个服务器通过多播发送UDP数据包和许多客户端列出这些多播数据包。 Each packet has a fixed size of 1040 Bytes, the whole data size which is sent by the server is 3GByte. 每个数据包的固定大小为1040字节,服务器发送的整个数据大小为3GByte。

My environment is follows: 我的环境如下:

1 Gbit Ethernet Network 1 Gbit以太网

40 Nodes, 1 Sender Node and 39 receiver Nodes. 40个节点,1个发送者节点和39个接收器节点。 All Nodes have the same hardware configuration: 2 AMD CPUs, each CPU has 2 Cores @2,6GHz 所有节点都具有相同的硬件配置:2个AMD CPU,每个CPU有2个核心@ 2,6GHz

On the client side, one thread reads the socket and put the data into a queue. 在客户端,一个线程读取套接字并将数据放入队列。 One additional thread pops the data from the queue and does some light weight processing. 一个额外的线程弹出队列中的数据并进行一些轻量级处理。

During the multicast transmission I recognize a packet drop rate of 30% on the node side. 在多播传输期间,我在节点侧识别出丢包率为30%。 By observing the netstat –su statistics I can say, that the missing packets by the client application are equal to the RcvbufErrors value from the netstat output. 通过观察netstat -su统计数据,我可以说,客户端应用程序丢失的数据包等于netstat输出的RcvbufErrors值。

That means that all missing packets are dropped by the OS because the socket buffer was full, but I do not understand why the capturing thread is not able to read the buffer in time. 这意味着操作系统会丢弃所有丢失的数据包,因为套接字缓冲区已满,但我不明白为什么捕获线程无法及时读取缓冲区。 During the transmission, 2 of the 4 cores are utilized by 75%, the rest is sleeping. 在传输过程中,4个核心中的2个被75%使用,其余的正在睡觉。 I'm the only one who is using these nodes, and I would assume that this kind of machines have no problem to handle 1Gbit bandwidth. 我是唯一使用这些节点的人,我认为这种机器在处理1Gbit带宽方面没有问题。 I have already done some optimization, by adding g++ compiler flags for amd cpus, this decrease the packet drop rate to 10%, but it is still too high in my opinion. 我已经做了一些优化,通过为amd cpus添加g ++编译器标志,这会将丢包率降低到10%,但在我看来它仍然太高了。

Of course I know that UDP is not reliable, I have my own correction protocol. 当然我知道UDP不可靠,我有自己的校正协议。

I do not have any administration permissions, so it's not possible for me to change the system parameters. 我没有任何管理权限,因此我无法更改系统参数。

Any hints how can I increase the performance? 任何提示如何提高性能?

EDIT: I solved this issue by using 2 threads which are reading the socket. 编辑:我通过使用正在读取套接字的2个线程解决了这个问题。 The recv socket buffer still becomes full sometimes. recv套接字缓冲区有时仍会变满。 But the average drop is under 1%, so it isn't a problem to handle it. 但平均跌幅低于1%,所以处理它不是问题。

Tracking down network drops on Linux can be a bit difficult as there are many components where packet drops can happen. 跟踪Linux上的网络丢失可能有点困难,因为有许多组件可能会发生丢包。 They can occur at the hardware level, in the network device subsystem, or in the protocol layers. 它们可以在硬件级别,网络设备子系统或协议层中发生。

I wrote a very detailed blog post explaining how to monitor and tune each component. 我写了一篇非常详细的博客文章,解释了如何监控和调整每个组件。 It's a bit hard to summarize as a succinct answer here since there are so many different components that need to be monitored and tuned. 由于有许多不同的组件需要监控和调整,因此在这里总结一下这个简洁的答案有点难。

Aside from obvious removal of everything non-essential from the socket read loop: 除了从套接字读取循环中明显删除所有非必要内容之外:

  • Increase socket receive buffer with setsockopt(2) , 使用setsockopt(2)增加套接字接收缓冲区,
  • Use recvmmsg(2) , if your kernel supports it, to reduce number of system calls and kernel-userland copies, 如果您的内核支持,请使用recvmmsg(2)来减少系统调用和内核用户空间副本的数量,
  • Consider non-blocking approach with edge-triggered epoll(7) , 考虑使用边缘触发epoll(7)的非阻塞方法epoll(7)
  • See if you really need threads here, locking/synchronization is very expensive. 看看你真的需要线程,锁定/同步是非常昂贵的。

"On the client side, one thread reads the socket and put the data into a queue. " I guess the problem is in this thread. “在客户端,一个线程读取套接字并将数据放入队列中。”我猜问题出在这个线程中。 It is not receiving messages fast enough. 它没有足够快地接收消息。 Too much time is spent on something else, for example acquiring mutex when putting data into the queue. 在其他方面花费了太多时间,例如在将数据放入队列时获取互斥锁。 Try to optimize operations on the queue, such as use a lock-free queue. 尝试优化队列上的操作,例如使用无锁队列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM