简体繁体 English

如何正确终止信号处理程序中的线程？

[英]How to properly terminate a thread in a signal handler?

原文 2014-09-20 15:59:34 3 1 c/ linux/ multithreading/ signals/ posix

I want to set up a signal handler for SIGSEGV, SIGILL and possibly a few other signals that, rather than terminating the whole process, just terminates the offending thread and perhaps sets a flag somewhere so that a monitoring thread can complain and start another thread. 我想为SIGSEGV，SIGILL和可能的其他一些信号设置信号处理程序，而不是终止整个过程，而只是终止有问题的线程，并可能在某个地方设置一个标志，以便监视线程可以投诉并启动另一个线程。 I'm not sure there is a safe way to do this. 我不确定是否有安全的方法可以做到这一点。 Pthreads seems to provide functions for exiting the current thread, as well as canceling another thread, but these potentially call a bunch of at-exit handlers. Pthreads似乎提供了退出当前线程以及取消另一个线程的功能，但是它们可能会调用一堆退出处理程序。 Even if they don't, it seems as though there are many situations in which they are not async-signal-safe, although it is possible that those situations are avoidable. 即使它们不是，似乎在许多情况下它们也不是异步信号安全的，尽管这些情况是可以避免的。 Is there a lower-level function I can call that just destroys the thread? 我可以调用一个较低级别的函数来破坏线程吗？ Assuming I modify my own data structures in an async-signal-safe way, and acquire no mutexes, are there pthread/other global data structures that could be left in an inconsistent state simply by a thread terminating at a SIGSEGV? 假设我以异步信号安全的方式修改了自己的数据结构，并且不获取互斥体，是否存在仅通过终止于SIGSEGV的线程就可以使pthread /其他全局数据结构处于不一致状态？ malloc comes to mind, but malloc itself shouldn't SIGSEGV/SIGILL unless the libc is buggy. 我想到了malloc，但是除非libc有问题，否则malloc本身不应该SIGSEGV / SIGILL。 I realize that POSIX is very conservative here, and makes no guarantees. 我意识到POSIX在这里非常保守，不能做任何保证。 As long as there's a way to do this in practice I'm happy. 只要有一种方法可以在实践中做到，我就很高兴。 Forking is not an option, btw. 顺便说一句，不能选择分叉。

1 个解决方案

If the SIGSEGV / SIGILL /etc. 如果是SIGSEGV / SIGILL / SIGILL happens in your own code , the signal handler will not run in an async-signal context (it's fundamentally a synchronous signal, but would still be an AS context if it happened inside a standard library function), so you can legally call pthread_exit from the signal handler. 如果发生在您自己的代码中 ，则信号处理程序将不会在异步信号上下文中运行（从根本上来说，这是一个同步信号，但是如果它发生在标准库函数中，则仍将是AS上下文），因此您可以从合法地调用pthread_exit信号处理程序。 However, there are still issues that make this practice dubious: 但是，仍有一些问题使这种做法令人怀疑：

SIGSEGV / SIGILL /etc. SIGSEGV / SIGILL等 never occur in a program whose behavior is defined unless you generate them via raise , kill , pthread_kill , sigqueue , etc. (and in some of these special cases, they would be asynchronous signals). 除非您通过raise ， kill ， pthread_kill ， sigqueue等生成行为，否则在定义了行为的程序中绝不会发生这种情况（在某些特殊情况下，它们将是异步信号）。 Otherwise, they're indicative of a program having undefined behavior . 否则，它们表示程序具有未定义的行为 。 If the program has invoked undefined behavior, all bets are off. 如果程序调用了未定义的行为，则所有选择均关闭。 UB is not isolated to a particular thread or a particular sequence in time. UB没有隔离到特定线程或特定时间序列。 If the program has UB, its entire output/behavior is meaningless. 如果程序具有UB，则其整个输出/行为将毫无意义。
If the program's state is corrupted (eg due to access-after- free , use of invalid pointers, buffer overflows, ...) it's very possible that the first faulting access will happen inside part of the standard library (eg inside malloc ) rather than in your code. 如果程序的状态被破坏（例如，由于free -after- free ，使用无效指针，缓冲区溢出等），则第一次故障访问很有可能发生在标准库的一部分内（例如，在malloc ），而不是比您的代码中。 In this case, the signal handler runs in an AS-safe context and cannot call pthread_exit . 在这种情况下，信号处理程序在AS安全的上下文中运行，并且无法调用pthread_exit 。 Of course the program already has UB anyway (see the above point), but even if you wanted to pretend that's not an issue, you'd still be in trouble. 当然该程序已经有了UB（请参阅以上几点），但是即使您假装这不是问题，您仍然会遇到麻烦。

If your program is experiencing these kinds of crashes, you need to find the cause and fix it, not try to patch around it with signal handlers. 如果您的程序遇到此类崩溃，则需要找到原因并加以解决，而不是尝试使用信号处理程序对其进行修补。 Valgrind is your friend. Valgrind是你的朋友。 If that's not possible, your best bet is to isolate the crashing code into separate processes where you can reason about what happens if they crash asynchronously, rather than having the crashing code in the same process (where any further reasoning about the code's behavior is invalid once you know it crashes). 如果那是不可能的，那么最好的选择是将崩溃的代码隔离到单独的进程中，在该进程中，您可以推断出它们异步崩溃时会发生什么，而不是让崩溃的代码处于同一进程中（其中，有关代码行为的任何进一步的推理都是无效的）一旦您知道它崩溃了）。