epoll IO与C中的工作线程

Question

I am writing a small server that will receive data from multiple sources and process this data. 我正在编写一个小型服务器，它将从多个来源接收数据并处理这些数据。 The sources and data received is significant, but no more than epoll should be able to handle quite well. 收到的消息来源和数据非常重要，但epoll应该能够很好地处理。 However, all received data must be parsed and run through a large number of tests which is time consuming and will block a single thread despite epoll multiplexing. 但是，必须解析所有接收到的数据并运行大量的测试，这些测试非常耗时，并且尽管进行了epoll多路复用，仍会阻塞单个线程。 Basically, the pattern should be something like follows: IO-loop receives data and bundles it into a job, sends to the first thread available in the pool, the bundle is processed by the job and the result is passed pack to the IO loop for writing to file. 基本上，模式应该如下所示：IO循环接收数据并将其捆绑到作业中，发送到池中可用的第一个线程，捆绑由作业处理，结果传递到IO循环写入文件。

I have decided to go for a single IO thread and N worker threads. 我决定选择一个IO线程和N个工作线程。 The IO thread for accepting tcp connections and reading data is easy to implement using the example provided at: http://linux.die.net/man/7/epoll 使用以下示例提供的用于接受tcp连接和读取数据的IO线程很容易实现： http ： //linux.die.net/man/7/epoll

Thread are also usually easy enough to deal with, but I am struggling to combine the epoll IO loop with a threadpool in an elegant manner. 线程通常也很容易处理，但我正在努力将epoll IO循环与线程池以优雅的方式结合起来。 I am unable to find any "best practice" for using epoll with a worker pool online either, but quite a few questions regarding the same topic. 我无法找到任何与在线工作池使用epoll的“最佳实践”，但有关同一主题的相关问题。

I therefore have some question I hope someone can help me answering: 因此，我有一些问题，希望有人能帮我回答：

Could (and should) eventfd be used as a mechanism for 2-way synchronization between the IO thread and all the workers? 可以（并且应该）使用eventfd作为IO线程和所有工作者之间双向同步的机制吗？ For instance, is it a good idea for each worker thread to have its own epoll routine waiting on a shared eventfd (with a struct pointer, containing data/info about the job) ie using the eventfd as a job queue somehow? 例如，每个工作线程是否有一个好主意让自己的epoll例程等待共享的eventfd（带有结构指针，包含有关作业的数据/信息），即以某种方式使用eventfd作为作业队列？ Also perhaps have another eventfd to pass results back into the IO thread from multiple worker threads? 也许还有另一个eventfd将结果从多个工作线程传递回IO线程？
After the IO thread is signaled about more data on a socket, should the actual recv take place on the IO thread, or should the worker recv the data on their own in order to not block the IO thread while parsing data frames etc.? 在IO线程上发出关于套接字的更多数据的信号之后，实际的recv应该发生在IO线程上，还是工作者应该自己重新获取数据，以便在解析数据帧时不阻塞IO线程等？ In that case, how can I ensure safety, eg in case recv reads 1,5 frames of data in a worker thread and another worker thread receives the last 0,5 frame of data from the same connection? 在这种情况下，我如何确保安全性，例如，如果recv在工作线程中读取1,5帧数据而另一个工作线程从同一连接接收最后0.5帧数据？
If the worker thread pool is implemented through mutexes and such, will waiting for locks block the IO thread if N+1 threads are trying to use the same lock? 如果工作线程池是通过互斥锁等实现的，如果N + 1个线程试图使用相同的锁，那么等待锁会阻塞IO线程吗？
Are there any good practice patterns for how to build a worker thread pool around epoll with two way communication (ie both from IO to workers and back)? 对于如何使用双向通信（即从IO到工作者和返回）在epoll周围构建工作线程池，是否有任何良好的实践模式？

EDIT: Can one possible solution be to update a ring buffer from the IO-loop, after update send the ring buffer index to the workers through a shared pipe for all workers (thus giving away control of that index to the first worker that reads the index off the pipe), let the worker own that index until end of processing and then send the index number back into the IO-thread through a pipe again, thus giving back control? 编辑：一种可能的解决方案是从IO循环更新环形缓冲区，更新后通过所有工作人员的共享管道向工作人员发送环形缓冲区索引（从而将该索引的控制权交给第一个读取该索引的工作人员关闭管道索引），让工人拥有该索引直到处理结束，然后再通过管道将索引号发送回IO线程，从而给予回控制？

My application is Linux-only, so I can use Linux-only functionality in order to achieve this in the most elegant way possible. 我的应用程序仅限Linux，因此我可以使用仅限Linux的功能，以便以最优雅的方式实现这一目标。 Cross platform support is not needed, but performance and thread safety is. 不需要跨平台支持，但性能和线程安全性是必需的。

Answer 1

In my tests, one epoll instance per thread outperformed complicated threading models by far. 在我的测试中，每个线程一个epoll实例远远超过了复杂的线程模型。 If listener sockets are added to all epoll instances, the workers would simply accept(2) and the winner would be awarded the connection and process it for its lifetime. 如果将侦听器套接字添加到所有epoll实例，则工作人员将accept(2)并且获胜者将被授予连接并在其生命周期内处理它。

Your workers could look something like this: 你的工人看起来像这样：

for (;;) {
    nfds = epoll_wait(worker->efd, &evs, 1024, -1);

    for (i = 0; i < nfds; i++)
        ((struct socket_context*)evs[i].data.ptr)->handler(
            evs[i].data.ptr,
            evs[i].events);
}

And every file descriptor added to an epoll instance could have a struct socket_context associated with it: 添加到epoll实例的每个文件描述符都可以有一个struct socket_context关联的struct socket_context ：

void listener_handler(struct socket_context* ctx, int ev)
{
    struct socket_context* conn;

    conn->fd = accept(ctx->fd, NULL, NULL);
    conn->handler = conn_handler;

    /* add to calling worker's epoll instance or implement some form
     * of load balancing */
}

void conn_handler(struct socket_context* ctx, int ev)
{
    /* read all available data and process. if incomplete, stash
     * data in ctx and continue next time handler is called */
}

void dummy_handler(struct socket_context* ctx, int ev)
{
    /* handle exit condition async by adding a pipe with its
     * own handler */
}

I like this strategy because: 我喜欢这个策略，因为：

very simple design; 设计非常简单;
all threads are identical; 所有线程都相同;
workers and connections are isolated--no stepping on each other's toes or calling read(2) in the wrong worker; 工人和人际关系是孤立的 - 没有踩到对方的脚趾或在错误的工人中呼唤read(2) ;
no locks are required (the kernel gets to worry about synchronization on accept(2) ); 不需要锁（内核会担心accept(2)上的同步）;
somewhat naturally load balanced since no busy worker will actively contend on accept(2) . 因为没有繁忙的工作人员会主动accept(2)所以有点自然负载平衡。

And some notes on epoll: 关于epoll的一些注释：

use edge-triggered mode, non-blocking sockets and always read until EAGAIN ; 使用边缘触发模式，非阻塞套接字，并始终读取，直到EAGAIN ;
avoid dup(2) family of calls to spare yourself from some surprises (epoll registers file descriptors , but actually watches file descriptions ); 避免使用dup(2)系列调用来避免一些意外（epoll寄存器文件描述符 ，但实际上是监视文件描述）;
you can epoll_ctl(2) other threads' epoll instances safely; 你可以安全地epoll_ctl(2)其他线程的epoll实例;
use a large struct epoll_event buffer for epoll_wait(2) to avoid starvation. 为epoll_wait(2)使用一个大的struct epoll_event缓冲区来避免饥饿。

Some other notes: 其他一些说明：

use accept4(2) to save a system call; 使用accept4(2)保存系统调用;
use one thread per core (1 for each physical if CPU-bound, or 1 for each each logical if I/O-bound); 每个核心使用一个线程（如果CPU绑定，每个物理1个，如果I / O绑定，则每个逻辑使用1个）;
poll(2) / select(2) is likely faster if connection count is low. 如果连接数低， poll(2) / select(2)可能会更快。

I hope this helps. 我希望这有帮助。

Answer 2

When performing this model, because we only know the packet size once we have fully received the packet, unfortunately we cannot offload the receive itself to a worker thread. 执行此模型时，因为我们只有在完全接收到数据包后才知道数据包大小，遗憾的是我们无法将接收本身卸载到工作线程。 Instead the best we can still do is a thread to receive the data which will have to pass off pointers to fully received packets. 相反，我们仍然可以做的最好的事情是接收数据的线程，该数据必须将指针传递给完全接收的数据包。

The data itself is probably best held in a circular buffer, however we will want a separate buffer for each input source (if we get a partial packet we can continue receiving from other sources without splitting up the data. The remaining question is how to inform the workers of when a new packet is ready, and to give them a pointer to the data in said packet. Because there is little data here, just some pointers the most elegant way of doing this would be with posix message queues. These provide the ability for multiple senders and multiple receivers to write and read messages, always ensuring every message is received and by precisely 1 thread. 数据本身可能最好保存在循环缓冲区中，但是我们需要为每个输入源提供一个单独的缓冲区（如果我们得到一个部分数据包，我们可以继续从其他来源接收而不分割数据。剩下的问题是如何通知新数据包准备就绪的工作者，并给他们一个指向所述数据包中数据的指针。由于这里的数据很少，只有一些指针，最优雅的方法是使用posix消息队列。这些提供了多个发送者和多个接收者能够写入和读取消息的能力，始终确保每个消息都被接收并且精确地通过1个线程。

You will want a struct resembling the one below for each data source, I shall go through the fields purposes now. 对于每个数据源，您将需要一个类似下面的结构，我现在将通过字段目的。

struct DataSource
{
    int SourceFD;
    char DataBuffer[MAX_PACKET_SIZE * (THREAD_COUNT + 1)];
    char *LatestPacket;
    char *CurrentLocation
    int SizeLeft;
};

The SourceFD is obviously the file descriptor to the data stream in question, the DataBuffer is where Packets contents are held while being processed, it is a circular buffer. SourceFD显然是有问题的数据流的文件描述符，DataBuffer是处理包时内容的地方，它是一个循环缓冲区。 The LatestPacket pointer is used to temporarily hold a pointer to the most resent packet in case we receive a partial packet and move onto another source before passing the packet off. LatestPacket指针用于临时保存指向最重新发送的数据包的指针，以防我们收到部分数据包并在关闭数据包之前移动到另一个源。 The CurrentLocation stores where the latest packet ends so that we know where to place the next one, or where to carry on in case of partial receive. CurrentLocation存储最新数据包的结束位置，以便我们知道下一个数据包的放置位置或部分接收的位置。 The size left is the room left in the buffer, this will be used to tell if we can fit the packet or need to circle back around to the beginning. 剩下的大小是缓冲区中留下的空间，这将用于判断我们是否可以适应数据包或需要绕回到开头。

The receiving function will thus effectively 因此接收功能将有效

Copy the contents of the packet into the buffer 将数据包的内容复制到缓冲区中
Move CurrentLocation to point to the end of the packet 将CurrentLocation移动到指向数据包的末尾
Update SizeLeft to account for the now decreased buffer 更新SizeLeft以考虑现在减少的缓冲区
If we cannot fit the packet in the end of the buffer we cycle around 如果我们无法将数据包放在缓冲区的末尾，我们就会循环
If there is no room there either we try again a bit later, going to another source meanwhile 如果那里没有空间，我们要稍后再试一次，同时去另一个来源
If we had a partial receive store the LatestPacket pointer to point to the start of the packet and go to another stream until we get the rest 如果我们有一个部分接收存储，则LatestPacket指针指向数据包的开始并转到另一个流，直到我们得到其余的
Send a message using a posix message queue to a worker thread so it can process the data, the message will contain a pointer to the DataSource structure so it can work on it, it also needs a pointer to the packet it is working on, and it's size, these can be calculated when we receive the packet 使用posix消息队列向工作线程发送消息，以便它可以处理数据，消息将包含指向DataSource结构的指针，以便它可以在其上工作，它还需要指向它正在处理的数据包的指针，以及它的大小，这些可以在我们收到数据包时计算出来

The worker thread will do its processing using the received pointers and then increase the SizeLeft so the receiver thread will know it can carry on filling the buffer. 工作线程将使用接收的指针进行处理，然后增加SizeLeft，以便接收器线程知道它可以继续填充缓冲区。 The atomic functions will be needed to work on the size value in the struct so we don't get race conditions with the size property (as it is possible it is written by a worker and the IO thread simultaneously, causing lost writes, see my comment below), they are listed here and are simple and extremely useful. 原子函数将需要处理结构中的大小值，因此我们不会获得具有size属性的竞争条件（因为它可能由工作者和IO线程同时写入，导致丢失的写入，请参阅我的在下面评论），它们在这里列出并且简单且非常有用。

Now, I have given some general background but will address the points given specifically: 现在，我已经给出了一些一般背景，但将具体说明给出的要点：

Using the EventFD as a synchronization mechanism is largely a bad idea, you will find yourself using a fair amount of unneeded CPU time and it is very hard to perform any synchronization. 使用EventFD作为同步机制在很大程度上是一个坏主意，你会发现自己使用了相当多的不必要的CPU时间，并且很难执行任何同步。 Particularly if you have multiple threads pick up the same file descriptor you could have major problems. 特别是如果你有多个线程选择相同的文件描述符，你可能会遇到重大问题。 This is in effect a nasty hack that will work sometimes but is no real substitute for proper synchronization. 这实际上是一个讨厌的黑客，有时会工作，但不能真正替代正确的同步。
It is also a bad idea to try and offload the receive as explained above, you can get around the issue with complex IPC but frankly it is unlikely receiving IO will take enough time to stall your application, your IO is also likely much slower than CPU so receiving with multiple threads will gain little. 如上所述尝试卸载接收也是一个坏主意，你可以解决复杂IPC的问题，但坦率地说，接收IO不太可能花费足够的时间来停止应用程序，你的IO也可能比CPU慢得多因此，使用多线程接收将获得很少。 (this assumes you do not say, have several 10 gigabit network cards). （假设您没有说，有几个10千兆网卡）。
Using mutexes or locks is a silly idea here, it fits much better into lockless coding given the low amount of (simultaneously) shared data, you are really just handing off work and data. 使用互斥锁或锁是一个愚蠢的想法，它适用于无锁编码，因为（同时）共享数据量很少，你实际上只是在处理工作和数据。 This will also boost performance of the receive thread and make your app far more scalable. 这也将提高接收线程的性能，使您的应用程序更具可扩展性。 Using the functions mentioned here http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html you can do this nice and easily. 使用这里提到的功能http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html，你可以很容易地做到这一点。 If you did do it this way, what you would need is a semaphore, this can be unlocked every time a packet is received and locked by each thread which starts a job to allow dynamically more threads in if more packets are ready, that would have far less overhead then a homebrew solution with mutexes. 如果你这样做的话，你需要的是一个信号量，这可以在每次接收到一个数据包并被每个线程锁定时解锁，这个线程启动一个作业，以便在更多数据包准备就绪时允许动态更多线程，这将有使用互斥锁的自制解决方案远远少于开销。
There is not really much difference here to any thread pool, you spawn a lot of threads then have them all block in mq_receive on the data message queue to wait for messages. 这里与任何线程池没有太大区别，你产生了很多线程然后让它们全部阻塞在数据消息队列上的mq_receive中以等待消息。 When they are done they send their result back to the main thread which adds the results message queue to its epoll list. 完成后，他们将结果发送回主线程，主线程将结果消息队列添加到其epoll列表中。 It can then receive results this way, it is simple and very efficient for small data payloads like pointers. 然后，它可以以这种方式接收结果，对于像指针这样的小数据有效载荷，它非常简单且非常有效。 This will also use little CPU and not force the main thread to waste time managing workers. 这也将使用很少的CPU，而不是强迫主线程浪费时间管理工作人员。

Finally your edit is fairly sensible, except for the fact as I ave suggested, message queues are far better than pipes here as they very efficiently signal events , guarantee a full message read and provide automatic framing. 最后你的编辑是相当明智的，除了我提出的事实，消息队列远比管道好，因为它们非常有效地发出事件信号，保证完整的消息读取并提供自动框架。

I hope this helps, however it is late so if I missed anything or you have questions feel free to comment for clarification or more explanation. 我希望这有所帮助，但是它已经很晚了，所以如果我错过了任何问题，或者您有任何问题可以随意评论澄清或更多解释。

Answer 3

I post the same answer in other post: I want to wait on both a file descriptor and a mutex, what's the recommended way to do this? 我在其他帖子中发布了相同的答案：我想等待文件描述符和互斥量，建议的方法是什么？

========================================================== ================================================== ========

This is a very common seen problem, especially when you are developing network server-side program. 这是一个非常常见的问题，尤其是在开发网络服务器端程序时。 Most Linux server-side program's main look will loop like this: 大多数Linux服务器端程序的主要外观将像这样循环：

epoll_add(serv_sock);
while(1){
    ret = epoll_wait();
    foreach(ret as fd){
        req = fd.read();
        resp = proc(req);
        fd.send(resp);
    }
}

It is single threaded(the main thread), epoll based server framework. 它是单线程（主线程），基于epoll的服务器框架。 The problem is, it is single threaded, not multi-threaded. 问题是，它是单线程的，而不是多线程的。 It requires that proc() should never blocks or runs for a significant time(say 10 ms for common cases). 它要求proc（）永远不会阻塞或运行很长时间（例如，对于常见情况，为10毫秒）。

If proc() will ever runs for a long time, WE NEED MULTI THREADS, and executes proc() in a separated thread(the worker thread). 如果proc（）将运行很长时间，我们需要MULTI THREADS，并在一个单独的线程（工作线程）中执行proc（）。

We can submit task to the worker thread without blocking the main thread, using a mutex based message queue, it is fast enough. 我们可以在不阻塞主线程的情况下向工作线程提交任务，使用基于互斥锁的消息队列，它足够快。

Then we need a way to obtain the task result from a worker thread. 然后我们需要一种从工作线程获取任务结果的方法。 How? 怎么样？ If we just check the message queue directly, before or after epoll_wait(), however, the checking action will execute after epoll_wait() to end, and epoll_wait() usually blocks for 10 micro seconds(common cases) if all file descriptors it wait are not active. 如果我们只是在epoll_wait（）之前或之后直接检查消息队列，但是，在epoll_wait（）结束后执行检查操作，如果等待所有文件描述符，则epoll_wait（）通常会阻塞10微秒（常见情况）不活跃。

For a server, 10 ms is quite a long time! 对于服务器，10毫秒是相当长的时间！ Can we signal epoll_wait() to end immediately when task result is generated? 我们可以发信号通知epoll_wait（）在生成任务结果时立即结束吗？

Yes! 是! I will describe how it is done in one of my open source project. 我将在我的一个开源项目中描述它是如何完成的。

Create a pipe for all worker threads, and epoll waits on that pipe as well. 为所有工作线程创建管道，epoll也在该管道上等待。 Once a task result is generated, the worker thread writes one byte into the pipe, then epoll_wait() will end in nearly the same time! 生成任务结果后，工作线程将一个字节写入管道，然后epoll_wait（）将在几乎相同的时间内结束！ - Linux pipe has 5 us to 20 us latency. - Linux管道有5到20美元的延迟。

In my project SSDB (a Redis protocol compatible in-disk NoSQL database), I create a SelectableQueue for passing messages between the main thread and worker threads. 在我的项目SSDB （兼容Redis协议的磁盘NoSQL数据库）中，我创建了一个SelectableQueue，用于在主线程和工作线程之间传递消息。 Just like its name, SelectableQueue has an file descriptor, which can be wait by epoll. 就像它的名字一样，SelectableQueue有一个文件描述符，可以通过epoll等待。

SelectableQueue: https://github.com/ideawu/ssdb/blob/master/src/util/thread.h#L94 SelectableQueue： https ： //github.com/ideawu/ssdb/blob/master/src/util/thread.h#L94

Usage in main thread: 在主线程中的用法：

epoll_add(serv_sock);
epoll_add(queue->fd());
while(1){
    ret = epoll_wait();
    foreach(ret as fd){
        if(fd is worker_thread){
            sock, resp = worker->pop_result();
            sock.send(resp);
        }
        if(fd is client_socket){
            req = fd.read();
            worker->add_task(fd, req);
        }
    }
}

Usage in worker thread: 工作线程中的用法：

fd, req = queue->pop_task();
resp = proc(req);
queue->add_result(fd, resp);

epoll IO与C中的工作线程

问题描述

3 个解决方案

解决方案1
6 2014-02-20 00:21:59

解决方案2
3 已采纳 2014-02-25 23:14:58

解决方案3
0 2017-09-14 10:40:26

epoll IO与C中的工作线程

问题描述

3 个解决方案

解决方案1 6 2014-02-20 00:21:59

解决方案2 3 已采纳 2014-02-25 23:14:58

解决方案3 0 2017-09-14 10:40:26

解决方案1
6 2014-02-20 00:21:59

解决方案2
3 已采纳 2014-02-25 23:14:58

解决方案3
0 2017-09-14 10:40:26