简体繁体 English

来自epoll_wait的epoll事件顺序

[英]epoll order of events from epoll_wait

原文 2018-05-17 21:42:55 7 2 c/ linux/ sockets/ epoll

I have ported a program over to epoll from select to increase the number of sockets we can handle. 我已经将一个程序从select移植到epoll，以增加我们可以处理的套接字数量。 I have added the sockets to the epoll FD and can read and write happily. 我已经将插座添加到epoll FD中，可以愉快地读写。

However, I am concerned about potential starvation of sockets even though I am using level triggered events. 但是，即使我正在使用级别触发事件，我也担心套接字的潜在饥饿。 The scenario I am worried about is when there are more sockets ready than epoll_event structures. 我担心的情况是，有比epoll_event结构更多的套接字。 I know that the next time I call epoll_wait it will give me the rest of them, but I wonder what order I get them in with reguards to who didn't make the cut the last time vs this time. 我知道下次我打电话给epoll_wait它会给我剩下的一些，但是我想知道我用什么顺序让他们进入了最后一次没有进行切换的命令。

An example: Say I have 10 sockets connected and added to the epoll fd. 例如：假设我连接了10个套接字并添加到epoll fd。 I only have enough memory for 5 epoll_event structures. 我只有足够的内存用于5个epoll_event结构。 Let's assume that in the time between each epoll_wait , all 10 sockets receive data. 假设在每个epoll_wait之间的时间epoll_wait ，所有10个套接字都接收数据。 The first epoll_wait will return 5 epoll_event structures for processing, lets say it's sockets 1-5. 第一个epoll_wait将返回5个epoll_event结构进行处理，让我们说它是套接字1-5。 I process those 5 sockets and while I am doing so, more data comes in and all 10 sockets have more data to be read. 我处理这5个套接字，当我这样做时，会有更多数据进入，所有10个套接字都有更多数据需要读取。 I enter the epoll_wait again and get 5 more epoll_event structures. 我再次进入epoll_wait并获得另外5个epoll_event结构。

My question is what 5 sockets will I get on the second call to epoll_wait . 我的问题是第二次调用epoll_wait我会得到5个套接字。 Will it be sockets 1-5 because they were added to the epoll FD first? 它是插座1-5，因为它们首先被添加到epoll FD中吗？ Or will I get sockets 6-10 because those events were raised before more data came in on sockets 1-5? 或者我会得到套接字6-10，因为在套接字1-5上有更多数据进入之前这些事件被提出了吗？

Essentially, is epoll_wait like a FIFO queue or does it simply scan an internal list of sockets (and thereby favoring the first sockets in the list). 本质上， epoll_wait就像一个FIFO队列，或者只是扫描一个内部的套接字列表（从而有利于列表中的第一个套接字）。

EDIT: This is Linux kernel v4.9.62 编辑：这是Linux内核v4.9.62

2 个解决方案

The observation by @jxh about the behavior is correct, and the behavior is long established (and was originally intended, if I correctly recall my email conversations with the implementer, Davide Libenzi, many years ago). @jxh关于行为的观察是正确的，并且行为是长期建立的（并且如果我正确地回想起我与多年前实施者Davide Libenzi的电子邮件对话，那么这本来是有意的）。 It's unfortunate that it has not been documented so far. 遗憾的是到目前为止尚未记录在案。 But, I've fixed that for the upcoming manual pages release, where epoll_wait(2) will carry the text: 但是，我已经修复了即将发布的手册页面，其中epoll_wait（2）将带有文本：

If more than maxevents file descriptors are ready when epoll_wait() is called, then successive epoll_wait() calls will round robin through the set of ready file descriptors. 如果在epoll_wait()时准备好超过maxevents文件描述符，则连续的epoll_wait()调用将通过一组就绪文件描述符循环。 This behavior helps avoid starvation scenarios, where a process fails to notice that additional file descriptors are ready because it focuses on a set of file descriptors that are already known to be ready. 此行为有助于避免饥饿场景，其中进程未能注意到其他文件描述符已准备就绪，因为它侧重于已知已准备好的一组文件描述符。

Perusing through the source file for epoll , one sees that the ready events are maintained in a linked list. 仔细阅读epoll的源文件，可以看到就绪事件是在链表中维护的。 Events are removed from the head of the list and added to the end of the list. 事件从列表的头部删除并添加到列表的末尾。

Based on that, the answer is that the descriptor order is based on the order in which they became ready. 基于此，答案是描述符顺序基于它们准备好的顺序。