简体   繁体   English

SIGSEGV为什么不使过程崩溃?

[英]Why doesn't SIGSEGV crash the process?

I'm trying to implement breakpad to get crash reports and stack traces for our cross-platform Qt application. 我正在尝试实现Breakpad来获取跨平台Qt应用程序的崩溃报告和堆栈跟踪。 I think I implemented all the necessary code, but I can't get the application to crash reliably on Windows. 我想我实现了所有必要的代码,但是我无法使应用程序在Windows上可靠地崩溃。

I use MinGW gcc compiler and Qt. 我使用MinGW gcc编译器和Qt。

I created a button in the UI. 我在用户界面中创建了一个按钮。

void crash() {
    printf(NULL);
    int* x = 0;
     *x = 1;
    int a = 1/0;
}
/* .... */
connect(ui->btnCrash, SIGNAL(clicked()),this,SLOT(crash()));

When clicking the button, nothing really happens. 当单击按钮时,什么也没有发生。 However when running in debug mode, the debugger (gdb) detects a SIGSEGV on first function call and then abandons running the rest of the method. 但是,在调试模式下运行时,调试器(gdb)在第一个函数调用时检测到SIGSEGV,然后放弃运行该方法的其余部分。 I notice the same behavior when deliberately doing illegal stuff in other places in the code. 当我故意在代码中的其他地方进行非法操作时,我会注意到相同的行为。 This leads to unexpected/undefined behavior. 这会导致意外/不确定的行为。

Now this behavior is different from Linux, where when calling this crash(), the process is properly crashed, and a dump is created. 现在,此行为不同于Linux,后者在调用该crash()时,该进程已正确崩溃,并创建了转储。

So what's the difference ? 那有什么区别呢? How can I have the same behavior across platforms ? 我如何在平台之间具有相同的行为?

Your code has undefined behaviour in 您的代码中存在未定义的行为

*x = 1;

because you shall not dereference a null pointer. 因为您不应取消引用空指针。 Actually I am not so certain about dividing by zero, but once you got off the rails all bets are off anyhow. 实际上,我不确定是否要除以零,但是一旦您脱离困境,所有赌注都将变为无效。

If you want to signal a SIGSEGV then do that, but dont use undefined behaviour that may cause you code to do anything. 如果要发信号SIGSEGV请执行此操作,但不要使用未定义的行为,这可能会使您的代码执行任何操作。 You should not expect your code to have any output but rather fix it ;). 您不应期望您的代码有任何输出,而应该对其进行修复;)。

Here is source for a minimal console program that attempts to dereference a null pointer 这是尝试取消引用空指针的最小控制台程序的源代码

main.c main.c中

#include <stdio.h>

int shoot_my_foot() {
    int* x = 0;
    return *x;
}

int main()
{
    int i = shoot_my_foot();
    printf("%d\n",i);
    return 0;
}

I'll compile and run it on (Ubuntu 18.04) Linux: 我将在(Ubuntu 18.04)Linux上编译并运行它:

$ gcc -Wall -Wextra -o prog main.c
$ ./prog
Segmentation fault (core dumped)

What was the system return code? 系统返回码是什么?

$ echo $?
139

When a program is killed for a fatal signal, Linux returns 128 + the signal number to the caller. 当程序因致命信号而被杀死时,Linux将128 +信号号返回给调用者。 So that was 128 + 11, ie 128 + SIGSEGV . 就是128 + 11,即128 + SIGSEGV

That is what happens, on Linux, when a program tries to dereference a null pointer. 在Linux上,当程序尝试取消引用空指针时,就会发生这种情况。 This is what Linux did to the misbehaving program : it killed it and returned us 128 + SIGSEGV . 这就是Linux对行为异常的程序所做的 :它杀死了该程序 ,并向我们退回了128 + SIGSEGV It is not what the program did: it does not handle any signals. 这不是程序执行的操作 :它不处理任何信号。

Now I'll hop into a Windows 10 VM and compile and run the same program with the Microsoft C compiler: 现在,我跳入Windows 10 VM,并使用Microsoft C编译器编译并运行相同的程序:

>cl /Feprog /W4 main.c
Microsoft (R) C/C++ Optimizing Compiler Version 19.11.25547 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

main.c
Microsoft (R) Incremental Linker Version 14.11.25547.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:prog.exe
main.obj

>prog

>

Nothing. 没有。 So the program crashed, and: 因此程序崩溃了,并且:

>echo %errorlevel%
-1073741819

The system return code was -1073741819 , which is the signed integral value of 0xc0000005 , the famous Windows error code that means Access Violation . 系统返回码是-1073741819 ,这是带符号的整数值0xc0000005 ,著名的Windows错误代码表示访问冲突

Still in Windows, I'll now compile and run the program with GCC: 仍然在Windows中,我现在将使用GCC编译并运行该程序:

>gcc --version
gcc (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 7.2.0

>gcc -Wall -Wextra -o prog.exe main.c

>prog

>echo %errorlevel%
-1073741819

As before, the program crashed, system code 0xc0000005 . 和以前一样,程序崩溃,系统代码0xc0000005

One more time from the top: 从顶部再来一次:

>gcc -Wall -Wextra -o prog.exe main.c

>prog

>echo %errorlevel%
-1073741819

No change. 没变。

That is what happens, on Windows , when a program tries to dereference a null pointer. Windows上 ,当程序尝试取消引用空指针时,就会发生这种情况。 That is what Windows does to the misbehaving program: it kills it and returns us 0xc0000005 . 这就是Windows对行为异常的程序所做的:它杀死它并返回0xc0000005

There is nothing about the misbehaving C program we can thank for the fact Windows does the same thing with it whether we compile with it MinGW-W64 gcc or MS cl . 对于表现不佳的C程序,我们没有任何感谢,因为无论我们用MinGW-W64 gcc还是MS cl编译,Windows都会对Windows进行相同的处理。 And there is nothing about it we can blame for the fact that Windows does not do the same thing with it as Linux. 对于Windows与Linux所做的不同,我们对此无可厚非。

Indeed, there is nothing about it we can even thank for the fact that the same thing happened to the misbehaving program, compiled with GCC, both times when we just ran it. 的确,我们什至没有什么要感谢的事实,就是当我们刚运行GCC时,用GCC编译的行为异常的程序也发生了同样的事情。 Because the C (or C++) Standard does not promise that dereferencing a null pointer will cause SIGSEGV to be raised (or that division by 0 will cause SIGFPE , and so on). 因为C(或C ++)标准不保证解引用空指针将导致SIGSEGV升高(或者被0除将导致SIGFPE ,依此类推)。 It just promises that this operation results in undefined behaviour , including possibly causing SIGSEGV when the program is run under gdb , on Tuesdays, and otherwise not. 它只是保证该操作会导致未定义的行为 ,包括在星期二在gdb下运行该程序时可能导致SIGSEGV ,否则不会这样做。

As a matter of fact, the program does cause a SIGSEGV in all three of our compilation scenarios, as we can observe by giving the program a handler for that signal: 实际上,该程序确实在我们的所有三种编译方案中都导致了SIGSEGV ,正如我们可以通过为该程序提供该信号的处理程序来观察到的:

main_1.c main_1.c

#include <stdlib.h>
#include <stdio.h>
#include <signal.h>
#include <assert.h>

static void handler(int sig)
{
    assert(sig == SIGSEGV);
    fputs("Caught SIGSEGV\n", stderr);
    exit(128 + SIGSEGV);
}

int shoot_my_foot(void) {
    int* x = 0;
    return *x;
}

int main(void)
{
    int i;
    signal(SIGSEGV, handler);
    i = shoot_my_foot();
    printf("%d\n",i);
    return 0;
}

On Linux: 在Linux上:

$ gcc -Wall -Wextra -o prog main_1.c
$ ./prog
Caught SIGSEGV
$ echo $?
139

On Windows, with MinGW-W64 gcc`: 在Windows上,使用MinGW-W64 gcc`:

>gcc -Wall -Wextra -o prog.exe main_1.c

>prog
Caught SIGSEGV

>echo %errorlevel%
139

On Windows, with MS cl : 在Windows上,使用MS cl

>cl /Feprog /W4 main_1.c
Microsoft (R) C/C++ Optimizing Compiler Version 19.11.25547 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

main_1.c
Microsoft (R) Incremental Linker Version 14.11.25547.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:prog.exe
main_1.obj

>prog
Caught SIGSEGV

>echo %errorlevel%
139

That consistent behaviour is different from what we'd observe with the the original program under gdb : 这种一致的行为不同于我们在gdb下的原始程序所观察到的行为:

>gcc -Wall -Wextra -g -o prog.exe main.c

>gdb -ex run prog.exe
GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-w64-mingw32".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from prog.exe...done.
Starting program: C:\develop\so\scrap\prog.exe
[New Thread 6084.0x1e98]
[New Thread 6084.0x27b8]

Thread 1 received signal SIGSEGV, Segmentation fault.
0x0000000000401584 in shoot_my_foot () at main.c:5
5               return *x;
(gdb)

The reason for that is that gdb by default installs signal handlers for all fatal signals, and the behaviour of its SIGSEGV handler is to output the like of: 原因是gdb默认情况下会为所有致命信号安装信号处理程序,并且其SIGSEGV处理程序的行为是输出以下内容:

Thread 1 received signal SIGSEGV, Segmentation fault.
0x0000000000401584 in shoot_my_foot () at main.c:5
5               return *x;

and drop to the gdb prompt, unlike the behaviour of the SIGSEGV handler we installed in main_1.c . 并转到gdb提示符,这与我们在main_1.c安装的SIGSEGV处理程序的行为不同。

So there you have an answer to the question: 因此,您对问题有一个答案:

How can I have the same behavior across platforms ? 我如何在平台之间具有相同的行为?

that in practice is as good as it gets:- 在实践中所获得的好处:

You can handle signals in your program, and confine your signal handlers to code whose behaviour is the same across platforms, within your preferred meaning of the same . 您可以在程序中处理信号,并将信号处理程序限制为在相同平台的首选含义范围内,其行为在平台之间相同的代码

And this answer is only as good as it gets, in practice, because in principle , per the language Standard, you cannot depend upon an operation that causes undefined behaviour to raise any specific signal, or have any specific or even consistent outcome. 在实践中,这个答案只能说是正确的,因为原则上 ,根据语言标准,您不能依赖于导致未定义行为发出任何特定信号或具有任何特定甚至一致结果的操作。 If it is in fact your objective to implement consistent cross-platform handling of fatal signals , then the appropriate function call to provoke signal sig for your testing purposes is provided by the standard header <signal.h> (in C++, <csignal> ): 如果实际上您的目标是实现对致命信号的一致跨平台处理 ,则标准标头<signal.h> (在C ++中, <csignal> )提供用于测试目的的<csignal>信号sig的适当函数调用。 :

int raise( int sig )

Sends signal sig to the program. 发送信号sig到程序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM