简体   繁体   English

记录Linux上使用C ++终止进程的原因

[英]Log the reason for process termination with C++ on Linux

Is there any way to log, handle or otherwise leave some clue about why a process terminates, covering as many termination-causing events as possible? 是否有任何方法可以记录,处理或以其他方式留下有关进程为何终止的线索,以覆盖尽可能多的导致终止的事件?

I have a logging tool for my application and log many messages every minute. 我有一个用于应用程序的日志记录工具,每分钟记录许多消息。 I am running almost the whole program in a super try-catch block so I can log any unhandled exceptions. 我在超级try-catch块中几乎运行了整个程序,因此我可以记录任何未处理的异常。 I have also recently tried registering handlers for various process signals that may terminate the process. 我最近还尝试过为可能终止该过程的各种过程信号注册处理程序。 However the application is still crashing a few times per day and I have no idea why. 但是,该应用程序每天仍然崩溃几次,我也不知道为什么。

How many other fatal events might I be failing to log or handle? 我可能无法记录或处理多少其他致命事件? I expect there is a proper way of doing this, rather than consistently being left in the dark when the process dies for some new type of event I am not yet aware of. 我希望有一个适当的方法来执行此操作,而不是在流程死于我尚未意识到的某种新型事件时,始终处于黑暗状态。

Thanks very much. 非常感谢。

Having a super try/catch block means catchable exceptions aren't unhandled. 具有超级try/catch块意味着可捕获的异常不会被处理。 Note that you'll need these blocks for all started threads. 请注意,所有启动的线程都需要这些块。

That aside, you can use signal to catch termination signals. 除此之外,您可以使用signal来捕获终止信号。 These are: 这些是:

  • SIGABRT (Signal Abort) Abnormal termination, such as is initiated by the abort function. SIGABRT(信号中止)异常终止,例如由异常终止功能启动。
  • SIGFPE (Signal Floating-Point Exception) Erroneous arithmetic operation, such as zero divide or an operation resulting in overflow (not necessarily with a floating-point operation). SIGFPE(信号浮点异常)错误的算术运算,例如零除或导致溢出的运算(不一定是浮点运算)。
  • SIGILL (Signal Illegal Instruction) Invalid function image, such as an illegal instruction. SIGILL(信号非法指令)无效的功能映像,例如非法指令。 This is generally due to a corruption in the code or to an attempt to execute data. 这通常是由于代码损坏或尝试执行数据所致。
  • SIGINT (Signal Interrupt) Interactive attention signal. SIGINT(信号中断)交互式注意信号。 Generally generated by the application user. 通常由应用程序用户生成。
  • SIGSEGV (Signal Segmentation Violation) Invalid access to storage: When a program tries to read or write outside the memory it is allocated for it. SIGSEGV(违反信号分段)对存储的无效访问:当程序尝试在内存之外进行读写时,将为其分配内存。 SIGTERM (Signal Terminate) Termination request sent to program. SIGTERM(信号终止)终止请求已发送至程序。
  • signals defined by the implementation, but most crash causes should be covered by these. 实现定义的信号,但是大多数崩溃原因都应包括在这些信号中。

Also , it can be that the program is not crashing, but terminating either by returning from main (but I guess you already have that covered) or via a call to exit . 另外 ,可能程序没有崩溃,而是通过从main返回(但我想您已经了解了)或通过调用exit来终止。 In which case you can check the return value of the program and log that. 在这种情况下,您可以检查程序的返回值并将其记录下来。

You can register a function to handle unexpected exceptions: 您可以注册一个函数来处理意外的异常:

set_unexpected() set_unexpected()

If not delt will with will cause application to call terminat(). 如果不是,则将导致应用程序调用terminat()。

You can register a function to log things on termination: 您可以注册一个函数来记录终止事件:

set_terminate() set_terminate()

You can add your own atexit() logging function that will do stuff (set a flag so that it only does stuff if exit happens abnormally then set the flag just before leaving main). 您可以添加自己的atexit()日志记录函数来进行填充(设置一个标志,以便仅在退出异常发生时才进行填充,然后在离开main之前设置该标志)。

signal handler can be tricky (especially if you want them to be portable). 信号处理程序可能很棘手(特别是如果您希望它们具有可移植性)。 If you use them you are limited in what you can do safely inside so I usually limit myself to setting a global flags so that they can be handled by normal code (of course if you are terminating then that is very limiting). 如果使用它们,则在内部安全操作方面会受到限制,因此我通常将自己限制在设置全局标志的范围内,以便可以使用常规代码来处理它们(当然,如果要终止,则是非常有限的)。

Here's what I use in my programs, it works for me.... whenever my program crashes it prints a stack trace of the crash site to stdout (which is presumably redirected to a file or etc where you can read it later). 这是我在程序中使用的,对我有用。...每当我的程序崩溃时,它会将崩溃站点的堆栈跟踪记录打印到stdout(大概重定向到文件或以后可以读取的文件等)。

Note that you may need to pass -rdynamic as a flag in CXXFLAGS and/or LFLAGS in your Makefile to make sure the stack trace contains human-readable function names. 请注意,您可能需要在Makefile中的CXXFLAGS和/或LFLAGS中将-rdynamic作为标志传递,以确保堆栈跟踪包含人类可读的函数名。

#include <stdio.h>
#include <signal.h>
#include <execinfo.h>

void PrintStackTrace()
{
   void *array[256];
   size_t size = backtrace(array, 256);
   char ** strings = backtrace_symbols(array, 256);
   if (strings)
   {
      printf("--Stack trace follows (%zd frames):\n", size);
      for (size_t i = 0; i < size; i++) printf("  %s\n", strings[i]);
      printf("--End Stack trace\n");
      free(strings);
   }
   else printf("PrintStackTrace:  Error, could not generate stack trace!\n");
}

static void CrashSignalHandler(int sig)
{
   // Uninstall this handler, to avoid the possibility of an infinite regress
   signal(SIGSEGV, SIG_DFL);
   signal(SIGBUS,  SIG_DFL);
   signal(SIGILL,  SIG_DFL);
   signal(SIGABRT, SIG_DFL);
   signal(SIGFPE,  SIG_DFL);

   printf("CrashSignalHandler called with signal %i... I'm going to print a stack trace, then kill the process.\n", sig);
   PrintStackTrace();
   printf("Crashed process aborting now.... bye!\n");
   fflush(stdout);
   abort();
}

int main(int argc, char ** argv)
{
   signal(SIGSEGV, CrashSignalHandler);
   signal(SIGBUS,  CrashSignalHandler);
   signal(SIGILL,  CrashSignalHandler);
   signal(SIGABRT, CrashSignalHandler);
   signal(SIGFPE,  CrashSignalHandler);

   [...remainder of your program goes here...]
}

one code is worth many words: 一个代码值得多说:

#include <iostream>
#include <signal.h>

sigint_handler(int s) {
    std::cout<<"signal caught: "<<s<<std::endl;
    ::exit(-1);
}

void setup_signal() {
    struct sigaction sigIntHandler;
    sigIntHandler.sa_handler = sigint_handler;
    sigemptyset(&sigIntHandler.sa_mask);
    sigIntHandler.sa_flags = 0;
    sigaction(SIGINT, &sigIntHandler, NULL);
    sigaction(SIGTERM, &sigIntHandler, NULL);
}

int main() {
    setup_signal();
    /* do stuff */
    return 0;
}

of course, this only takes care of SIGINT/SIGTERM signals. 当然,这仅处理SIGINT / SIGTERM信号。 You'll also have to update this code with all the atexit(), set_terminate, super try/catch etc.. you can find. 您还必须使用所有可以找到的atexit(),set_terminate,super try / catch等更新此代码。 And in case you run into segfaults/bus errors/whatever... well you're doomed :) 而且如果您遇到段错误/总线错误/任何问题...那么您注定要失败:)

Check out this question. 看看这个问题。 How to find the reason for a dead process without log file on unix? 如何找到没有UNIX上的日志文件的死进程的原因?

There you will see that using bash to get the exit code of a process is much easier than using signal handlers or any kind of exit callbacks. 在那里,您将看到使用bash获取进程的退出代码比使用信号处理程序或任何类型的退出回调要容易得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM