简体   繁体   中英

Why doesn't SIGSEGV crash the process?

I'm trying to implement breakpad to get crash reports and stack traces for our cross-platform Qt application. I think I implemented all the necessary code, but I can't get the application to crash reliably on Windows.

I use MinGW gcc compiler and Qt.

I created a button in the UI.

void crash() {
    printf(NULL);
    int* x = 0;
     *x = 1;
    int a = 1/0;
}
/* .... */
connect(ui->btnCrash, SIGNAL(clicked()),this,SLOT(crash()));

When clicking the button, nothing really happens. However when running in debug mode, the debugger (gdb) detects a SIGSEGV on first function call and then abandons running the rest of the method. I notice the same behavior when deliberately doing illegal stuff in other places in the code. This leads to unexpected/undefined behavior.

Now this behavior is different from Linux, where when calling this crash(), the process is properly crashed, and a dump is created.

So what's the difference ? How can I have the same behavior across platforms ?

Your code has undefined behaviour in

*x = 1;

because you shall not dereference a null pointer. Actually I am not so certain about dividing by zero, but once you got off the rails all bets are off anyhow.

If you want to signal a SIGSEGV then do that, but dont use undefined behaviour that may cause you code to do anything. You should not expect your code to have any output but rather fix it ;).

Here is source for a minimal console program that attempts to dereference a null pointer

main.c

#include <stdio.h>

int shoot_my_foot() {
    int* x = 0;
    return *x;
}

int main()
{
    int i = shoot_my_foot();
    printf("%d\n",i);
    return 0;
}

I'll compile and run it on (Ubuntu 18.04) Linux:

$ gcc -Wall -Wextra -o prog main.c
$ ./prog
Segmentation fault (core dumped)

What was the system return code?

$ echo $?
139

When a program is killed for a fatal signal, Linux returns 128 + the signal number to the caller. So that was 128 + 11, ie 128 + SIGSEGV .

That is what happens, on Linux, when a program tries to dereference a null pointer. This is what Linux did to the misbehaving program : it killed it and returned us 128 + SIGSEGV . It is not what the program did: it does not handle any signals.

Now I'll hop into a Windows 10 VM and compile and run the same program with the Microsoft C compiler:

>cl /Feprog /W4 main.c
Microsoft (R) C/C++ Optimizing Compiler Version 19.11.25547 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

main.c
Microsoft (R) Incremental Linker Version 14.11.25547.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:prog.exe
main.obj

>prog

>

Nothing. So the program crashed, and:

>echo %errorlevel%
-1073741819

The system return code was -1073741819 , which is the signed integral value of 0xc0000005 , the famous Windows error code that means Access Violation .

Still in Windows, I'll now compile and run the program with GCC:

>gcc --version
gcc (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 7.2.0

>gcc -Wall -Wextra -o prog.exe main.c

>prog

>echo %errorlevel%
-1073741819

As before, the program crashed, system code 0xc0000005 .

One more time from the top:

>gcc -Wall -Wextra -o prog.exe main.c

>prog

>echo %errorlevel%
-1073741819

No change.

That is what happens, on Windows , when a program tries to dereference a null pointer. That is what Windows does to the misbehaving program: it kills it and returns us 0xc0000005 .

There is nothing about the misbehaving C program we can thank for the fact Windows does the same thing with it whether we compile with it MinGW-W64 gcc or MS cl . And there is nothing about it we can blame for the fact that Windows does not do the same thing with it as Linux.

Indeed, there is nothing about it we can even thank for the fact that the same thing happened to the misbehaving program, compiled with GCC, both times when we just ran it. Because the C (or C++) Standard does not promise that dereferencing a null pointer will cause SIGSEGV to be raised (or that division by 0 will cause SIGFPE , and so on). It just promises that this operation results in undefined behaviour , including possibly causing SIGSEGV when the program is run under gdb , on Tuesdays, and otherwise not.

As a matter of fact, the program does cause a SIGSEGV in all three of our compilation scenarios, as we can observe by giving the program a handler for that signal:

main_1.c

#include <stdlib.h>
#include <stdio.h>
#include <signal.h>
#include <assert.h>

static void handler(int sig)
{
    assert(sig == SIGSEGV);
    fputs("Caught SIGSEGV\n", stderr);
    exit(128 + SIGSEGV);
}

int shoot_my_foot(void) {
    int* x = 0;
    return *x;
}

int main(void)
{
    int i;
    signal(SIGSEGV, handler);
    i = shoot_my_foot();
    printf("%d\n",i);
    return 0;
}

On Linux:

$ gcc -Wall -Wextra -o prog main_1.c
$ ./prog
Caught SIGSEGV
$ echo $?
139

On Windows, with MinGW-W64 gcc`:

>gcc -Wall -Wextra -o prog.exe main_1.c

>prog
Caught SIGSEGV

>echo %errorlevel%
139

On Windows, with MS cl :

>cl /Feprog /W4 main_1.c
Microsoft (R) C/C++ Optimizing Compiler Version 19.11.25547 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

main_1.c
Microsoft (R) Incremental Linker Version 14.11.25547.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:prog.exe
main_1.obj

>prog
Caught SIGSEGV

>echo %errorlevel%
139

That consistent behaviour is different from what we'd observe with the the original program under gdb :

>gcc -Wall -Wextra -g -o prog.exe main.c

>gdb -ex run prog.exe
GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-w64-mingw32".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from prog.exe...done.
Starting program: C:\develop\so\scrap\prog.exe
[New Thread 6084.0x1e98]
[New Thread 6084.0x27b8]

Thread 1 received signal SIGSEGV, Segmentation fault.
0x0000000000401584 in shoot_my_foot () at main.c:5
5               return *x;
(gdb)

The reason for that is that gdb by default installs signal handlers for all fatal signals, and the behaviour of its SIGSEGV handler is to output the like of:

Thread 1 received signal SIGSEGV, Segmentation fault.
0x0000000000401584 in shoot_my_foot () at main.c:5
5               return *x;

and drop to the gdb prompt, unlike the behaviour of the SIGSEGV handler we installed in main_1.c .

So there you have an answer to the question:

How can I have the same behavior across platforms ?

that in practice is as good as it gets:-

You can handle signals in your program, and confine your signal handlers to code whose behaviour is the same across platforms, within your preferred meaning of the same .

And this answer is only as good as it gets, in practice, because in principle , per the language Standard, you cannot depend upon an operation that causes undefined behaviour to raise any specific signal, or have any specific or even consistent outcome. If it is in fact your objective to implement consistent cross-platform handling of fatal signals , then the appropriate function call to provoke signal sig for your testing purposes is provided by the standard header <signal.h> (in C++, <csignal> ):

int raise( int sig )

Sends signal sig to the program.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM