简体繁体 English

pthread_kill（）vs pthread_cancel（）终止为I / O阻塞的线程

[英]pthread_kill() vs pthread_cancel() to terminate a thread blocked for I/O

原文 2018-09-18 17:21:39 3 3 c++/ c/ linux/ sockets/ pthreads

In our server code we are using poll() system call for monitoring client sockets. 在我们的服务器代码中，我们使用poll（）系统调用来监视客户端套接字。 The poll() is called with a large timeout value. 以较大的超时值调用poll（）。 So the thread calling poll() gets blocked for I/O. 因此，调用poll（）的线程被阻止进行I / O。

As per the flow, we have a scenario where we need to terminate thread that blocked in poll() from a different thread. 根据流程，我们有一个场景，我们需要终止来自另一个线程的poll（）中阻塞的线程。 I have came across pthread_kill() and pthread_cancel() functions, which can terminate the target thread blocked for I/O. 我遇到过pthread_kill（）和pthread_cancel（）函数，它们可以终止为I / O阻塞的目标线程。

By reading the man pages, both these functions seems to work fine. 通过阅读手册页，这两个功能似乎都能正常工作。 Few links on internet suggested that both of these functions are dangerous to use. 互联网上很少有链接表明这两种功能都使用起来很危险。

Is there any alternative way to terminate the thread blocked for I/O ? 是否有其他方法可以终止为I / O阻塞的线程？ If not, which of these functions is recommended to use. 如果没有，建议使用以下哪个功能。

3 个解决方案

Depending on the exact implementation of your thread library, it's very likely the thread won't even return from poll when being killed - so, you probably won't even achieve what you want. 根据线程库的确切实现，线程被杀死时很可能甚至不会从poll返回-因此，您甚至可能无法实现所需的功能。

You need to be very careful not to create memory leaks, and still are very likely to create a file descriptor leak by killing the thread that owns it (note, thread resources, in contrary to processes, aren't "cleaned up" by the system). 您需要非常小心，不要造成内存泄漏，并且仍然很可能通过杀死拥有它的线程来创建文件描述符泄漏（请注意，与进程相反，线程资源不会被进程“清理”）系统）。

It is generally safer to use shorter timeout periods and poll a terminate flag in-between, or use signals to interrupt the system call, then terminate the thread under its own control, freeing all allocated resources. 通常，使用较短的超时时间并轮询其间的终止标志，或者使用信号来中断系统调用，然后在其自己的控制下终止线程，以释放所有分配的资源，通常更为安全。

An easy and clean option is to create a "signal" pipe. 一个简单而干净的选择是创建一个“信号”管道。 That is, call pipe , take the file descriptor for the "read" end and add it to your list of poll file descriptors (with POLLIN ). 也就是说，调用pipe ，获取“ read”端的文件描述符，并将其添加到poll文件描述符列表中（使用POLLIN ）。 Then, whenever you want to unblock the thread which is waiting in poll , just write a byte to the write end of the pipe. 然后，每当您要解除阻塞在poll等待的线程时，只需将一个字节写入管道的写入端即可。 The pipe, having received data, will return as readable in the blocked thread. 接收到数据的管道将在阻塞线程中以可读形式返回。 You can even specify different "commands" by varying the value of the byte written. 您甚至可以通过更改写入的字节的值来指定不同的“命令”。

(You'll need to read the byte from the pipe before it can be re-used of course.) （您当然需要从管道中读取该字节，然后才能重新使用它。）

There is no such thing as killing a thread. 没有杀死线程这样的事情。

The poorly-named pthread_kill function is a threads analogue of the poorly-named kill function, which sends a signal to process. 命名不佳的pthread_kill函数是命名为kill函数的线程类似物，后者将信号发送给进程。 The name kill historically made sense in that the default action of many signals is to kill the process. 从历史上讲， kill含义是合理的，因为许多信号的默认操作是杀死进程。 But this default action of killing the process does not depend on whether the signal was sent to the process or a particular thread - either way, the process terminates. 但是，杀死进程的默认操作并不取决于信号是发送到进程还是特定线程 —不管哪种方式，进程都会终止。

The only time pthread_kill is useful is when you want to invoke a signal handler on another thread. 只有当您想在另一个线程上调用信号处理程序时， pthread_kill才有用。 Unless you are certain that the signal handler could not have interrupted any function that is not async-signal-safe, the signal handler is limited to calling functions which are async-signal-safe, and thereby cannot even act to end the thread's lifetime ( pthread_exit is not async-signal-safe). 除非您确定信号处理程序无法中断任何非异步信号安全的函数，否则信号处理程序仅限于调用异步信号安全的函数，因此甚至无法采取行动来终止线程的生存期（ pthread_exit不是异步信号安全的）。

If you're okay with the thread eventually terminating as a result of the call, pthread_cancel is the right way to end a thread stuck in a blocking operation. 如果您对线程最终因调用而终止感到满意，那么pthread_cancel是结束阻塞在阻塞操作中的线程的正确方法。 In order to use it safely, though, you need to make heavy use of pthread_cleanup_push and pthread_cleanup_pop . 但是，为了安全地使用它，您需要大量使用pthread_cleanup_push和pthread_cleanup_pop 。

If you don't want the thread to terminate, signals are your only option. 如果您不希望线程终止，则信号是您唯一的选择。 You have two choices: 您有两种选择：

Install a signal handler (can be a no-op) using sigaction without SA_RESTART , so that it causes EINTR . 在不使用SA_RESTART情况下使用sigaction安装信号处理程序（可以是无操作），这样会导致EINTR 。 Since there are inherent race conditions in this approach (if you send the signal just before the blocking syscall is entered, rather than once it's blocked, the signal won't do anything) you need to repeatedly send the signal, with exponential back-off so as not to starve the target of execution time, until the target confirms via some other synchronization mechanism (a POSIX semaphore works well) that it got the message. 由于这种方法存在固有的竞争条件（如果您在进入阻塞的系统调用之前发送信号，而不是一旦被阻塞，则信号将不会执行任何操作），因此您需要重复发送带有指数补偿的信号为了避免饿死目标的执行时间，直到目标通过某种其他同步机制（POSIX信号灯工作正常）确认收到消息为止。
Install a signal handler that will longjmp . 安装一个信号处理程序，它将longjmp 。 In order to do this safely you need to control the context from which it can happen; 为了安全地执行此操作，您需要控制可能发生此操作的上下文； the easiest way to do this is to keep it blocked in the signal mask normally, only unmasking it when the jmp_buf is valid around a blocking call. 最简单的方法是将其正常保持在信号屏蔽中，仅当jmp_buf在阻塞调用周围有效时才取消屏蔽它。 The blocking function you call needs to be async-signal-safe, and it needs to not be one which allocates or frees resources (like open or close ) since you will lose knowledge of whether it completed when you handle the signal. 您调用的阻塞函数必须是异步信号安全的，并且不必是分配或释放资源 （例如open或close ）的函数，因为您在处理信号时将不知道它是否完成 。 Of course the jmp_buf , or a pointer to it, needs to be a thread-local object ( _Thread_local / __thread ) in order for this to work at all. 当然， jmp_buf或指向它的指针必须是线程局部对象（ _Thread_local / __thread ），才能使其完全起作用。