简体   繁体   English

在 C 中产生 Segfault 的最简单的标准符合方法是什么?

[英]What is the simplest standard conform way to produce a Segfault in C?

I think the question says it all.我认为这个问题说明了一切。 An example covering most standards from C89 to C11 would be helpful.涵盖从 C89 到 C11 的大多数标准的示例将很有帮助。 I though of this one, but I guess it is just undefined behaviour:我虽然是这个,但我想这只是未定义的行为:

#include <stdio.h>

int main( int argc, char* argv[] )
{
  const char *s = NULL;
  printf( "%c\n", s[0] );
  return 0;
}

EDIT:编辑:

As some votes requested clarification: I wanted to have a program with an usual programming error (the simplest I could think of was an segfault), that is guaranteed (by standard) to abort.正如一些投票要求澄清的那样:我想要一个程序有一个通常的编程错误(我能想到的最简单的是一个段错误),它(按标准)保证中止。 This is a bit different to the minimal segfault question, which don't care about this insurance.这与最小的段错误问题有点不同,它不关心这个保险。

raise()可用于引发段错误:

raise(SIGSEGV);

A segmentation fault is an implementation defined behavior .分段错误是实现定义的行为 The standard does not define how the implementation should deal with undefined behavior and in fact the implementation could optimize out undefined behavior and still be compliant.该标准没有定义实现应该如何处理未定义的行为,实际上实现可以优化未定义的行为并且仍然是合规的。 To be clear, implementation defined behavior is behavior which is not specified by the standard but the implementation should document.需要明确的是,实现定义的行为是标准未指定但实现应该记录的行为。 Undefined behavior is code that is non-portable or erroneous and whose behavior is unpredictable and therefore can not be relied on.未定义的行为是不可移植或错误的代码,其行为不可预测,因此不能依赖。

If we look at the C99 draft standard §3.4.3 undefined behavior which comes under the Terms, definitions and symbols section in paragraph 1 it says ( emphasis mine going forward ):如果我们查看C99 草案标准§3.4.3未定义的行为,它属于第1段中的术语、定义和符号部分,它说(强调我的未来):

behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements使用不可移植或错误程序结构或错误数据时的行为,本国际标准对此没有要求

and in paragraph 2 says:在第2段中说:

NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).注意 可能的未定义行为范围从完全忽略具有不可预测结果的情况,到在翻译或程序执行期间以环境特征的记录方式表现(有或没有发出诊断消息),到终止翻译或执行(使用发出诊断消息)。

If, on the other hand, you simply want a method defined in the standard that will cause a segmentation fault on most Unix-like systems then raise(SIGSEGV) should accomplish that goal.另一方面,如果您只是想要标准中定义的方法,该方法会在大多数类 Unix系统上导致分段错误,那么raise(SIGSEGV)应该可以实现该目标。 Although, strictly speaking, SIGSEGV is defined as follows:虽然严格来说, SIGSEGV的定义如下:

SIGSEGV an invalid access to storage SIGSEGV 对存储的无效访问

and §7.14 Signal handling <signal.h> says:和 §7.14信号处理<signal.h>说:

An implementation need not generate any of these signals, except as a result of explicit calls to the raise function .实现不需要生成任何这些信号,除非是对 raise 函数的显式调用 Additional signals and pointers to undeclarable functions, with macro definitions beginning, respectively, with the letters SIG and an uppercase letter or with SIG_ and an uppercase letter,219) may also be specified by the implementation.附加的信号和指向不可声明函数的指针,宏定义分别以字母 SIG 和一个大写字母或 SIG_ 和一个大写字母开头,219)也可以由实现指定。 The complete set of signals, their semantics, and their default handling is implementation-defined ;完整的信号集、它们的语义和它们的默认处理是实现定义的 all signal numbers shall be positive.所有信号编号都应为正数。

The standard only mentions undefined behavior.该标准仅提及未定义的行为。 It knows nothing about memory segmentation.它对内存分段一无所知。 Also note that the code that produces the error is not standard-conformant.另请注意,产生错误的代码不符合标准。 Your code cannot invoke undefined behavior and be standard conformant at the same time.您的代码不能同时调用未定义的行为并符合标准。

Nonetheless, the shortest way to produce a segmentation fault on architectures that do generate such faults would be:尽管如此,在确实产生此类故障的架构上产生分段错误的最短方法是:

int main()
{
    *(int*)0 = 0;
}

Why is this sure to produce a segfault?为什么这肯定会产生段错误? Because access to memory address 0 is always trapped by the system;因为访问内存地址0总是被系统困住; it can never be a valid access (at least not by userspace code.)它永远不可能是有效的访问(至少不是通过用户空间代码。)

Note of course that not all architectures work the same way.当然请注意,并非所有架构都以相同的方式工作。 On some of them, the above could not crash at all, but rather produce other kinds of errors.在其中一些上,上述内容根本不会崩溃,而是会产生其他类型的错误。 Or the statement could be perfectly fine, even, and memory location 0 is accessible just fine.或者该语句可能非常好,甚至可以很好地访问内存位置 0。 Which is one of the reasons why the standard doesn't actually define what happens.这就是该标准实际上并未定义会发生什么的原因之一。

A correct program doesn't produce a segfault.正确的程序不会产生段错误。 And you cannot describe deterministic behaviour of an incorrect program.而且您无法描述不正确程序的确定性行为。

A "segmentation fault" is a thing that an x86 CPU does. “分段错误”是 x86 CPU 所做的事情。 You get it by attempting to reference memory in an incorrect way.您可以通过尝试以不正确的方式引用内存来获得它。 It can also refer to a situation where memory access causes a page fault (ie trying to access memory that's not loaded into the page tables) and the OS decides that you had no right to request that memory.它还可以指内存访问导致页面错误(即尝试访问未加载到页表中的内存)并且操作系统决定您无权请求该内存的情况。 To trigger those conditions, you need to program directly for your OS and your hardware.要触发这些条件,您需要直接为您的操作系统和硬件进行编程。 It is nothing that is specified by the C language.它不是 C 语言指定的。

If we assume we are not raising a signal calling raise , segmentation fault is likely to come from undefined behavior.如果我们假设我们没有发出调用raise的信号,则分段错误很可能来自未定义的行为。 Undefined behavior is undefined and a compiler is free to refuse to translate so no answer with undefined is guaranteed to fail on all implementations.未定义的行为是未定义的,编译器可以自由拒绝翻译,因此未定义的任何答案都不能保证在所有实现上都失败。 Moreover a program which invokes undefined behavior is an erroneous program.此外,调用未定义行为的程序是错误程序。

But this one is the shortest I can get that segfault on my system:但这是我能在我的系统上得到该段错误的最短时间:

main(){main();}

(I compile with gcc and -std=c89 -O0 ). (我用gcc-std=c89 -O0编译)。

And by the way, does this program really invokes undefined bevahior?顺便说一句,这个程序真的会调用未定义的行为吗?

 main;

That's it.而已。

Really.真的。

Essentially, what this does is it defines main as a variable .本质上,它的作用是将main定义为variable In C, variables and functions are both symbols -- pointers in memory, so the compiler does not distinguish them, and this code does not throw an error.在C语言中,变量和函数都是符号——内存中的指针,所以编译器不区分它们,这段代码也不会抛出错误。

However, the problem rests in how the system runs executables.但是,问题在于系统如何运行可执行文件。 In a nutshell, the C standard requires that all C executables have an environment-preparing entrypoint built into them, which basically boils down to "call main ".简而言之,C 标准要求所有 C 可执行文件都有一个内置的环境准备入口点,这基本上归结为“调用main ”。

In this particular case, however, main is a variable, so it is placed in a non-executable section of memory called .bss , intended for variables (as opposed to .text for the code).然而,在这种特殊情况下, main是一个变量,因此它被放置在一个名为.bss的内存的不可执行部分中,用于变量(而不是.text用于代码)。 Trying to execute code in .bss violates its specific segmentation, so the system throws a segmentation fault.尝试执行.bss中的代码违反了其特定的分段,因此系统会引发分段错误。

To illustrate, here's (part of) an objdump of the resulting file:为了说明,这里是(部分)结果文件的objdump

# (unimportant)

Disassembly of section .text:

0000000000001020 <_start>:
    1020:   f3 0f 1e fa             endbr64 
    1024:   31 ed                   xor    %ebp,%ebp
    1026:   49 89 d1                mov    %rdx,%r9
    1029:   5e                      pop    %rsi
    102a:   48 89 e2                mov    %rsp,%rdx
    102d:   48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
    1031:   50                      push   %rax
    1032:   54                      push   %rsp
    1033:   4c 8d 05 56 01 00 00    lea    0x156(%rip),%r8        # 1190 <__libc_csu_fini>
    103a:   48 8d 0d df 00 00 00    lea    0xdf(%rip),%rcx        # 1120 <__libc_csu_init>

    # This is where the program should call main
    1041:   48 8d 3d e4 2f 00 00    lea    0x2fe4(%rip),%rdi      # 402c <main> 
    1048:   ff 15 92 2f 00 00       callq  *0x2f92(%rip)          # 3fe0 <__libc_start_main@GLIBC_2.2.5>
    104e:   f4                      hlt    
    104f:   90                      nop

# (nice things we still don't care about)

Disassembly of section .data:

0000000000004018 <__data_start>:
    ...

0000000000004020 <__dso_handle>:
    4020:   20 40 00                and    %al,0x0(%rax)
    4023:   00 00                   add    %al,(%rax)
    4025:   00 00                   add    %al,(%rax)
    ...

Disassembly of section .bss:

0000000000004028 <__bss_start>:
    4028:   00 00                   add    %al,(%rax)
    ...

# main is in .bss (variables) instead of .text (code)

000000000000402c <main>:
    402c:   00 00                   add    %al,(%rax)
    ...

# aaand that's it! 

PS: This won't work if you compile to a flat executable. PS:如果您编译为平面可执行文件,这将不起作用。 Instead, you will cause undefined behaviour.相反,您将导致未定义的行为。

On some platforms, a standard-conforming C program can fail with a segmentation fault if it requests too many resources from the system.在某些平台上,如果从系统请求太多资源,符合标准的 C 程序可能会因分段错误而失败。 For instance, allocating a large object with malloc can appear to succeed, but later, when the object is accessed, it will crash.例如,使用malloc分配一个大对象可能看起来成功,但稍后,当访问该对象时,它会崩溃。

Note that such a program is not strictly conforming;请注意,这样的程序并不严格符合; programs which meet that definition have to stay within each of the minimum implementation limits.符合该定义的程序必须保持在每个最低实施限制内。

A standard-conforming C program cannot produce a segmentation fault otherwise, because the only other ways are via undefined behavior.否则,符合标准的 C 程序不会产生分段错误,因为唯一的其他方式是通过未定义的行为。

The SIGSEGV signal can be raised explicitly, but there is no SIGSEGV symbol in the standard C library. SIGSEGV信号可以显式引发,但标准 C 库中没有SIGSEGV符号。

(In this answer, "standard-conforming" means: "Uses only the features described in some version of the ISO C standard, avoiding unspecified, implementation-defined or undefined behavior, but not necessarily confined to the minimum implementation limits.") (在此答案中,“符合标准”的意思是:“仅使用 ISO C 标准的某些版本中描述的功能,避免未指定、实现定义或未定义的行为,但不一定限于最低实现限制。”)

考虑最少字符数的最简单形式是:

++*(int*)0;

Most of the answers to this question are talking around the key point, which is: The C standard does not include the concept of a segmentation fault.这个问题的大部分答案都围绕着一个关键点,即: C标准不包含分段错误的概念。 (Since C99 it includes the signal number SIGSEGV , but it does not define any circumstance where that signal is delivered, other than raise(SIGSEGV) , which as discussed in other answers doesn't count.) (自 C99 以来,它包括信号编号SIGSEGV ,但它没有定义传递该信号的任何情况,除了raise(SIGSEGV) ,如其他答案中所讨论的不计算在内。)

Therefore, there is no "strictly conforming" program (ie program that uses only constructs whose behavior is fully defined by the C standard, alone) that is guaranteed to cause a segmentation fault.因此,没有保证会导致分段错误的“严格符合”程序(即仅使用行为完全由 C 标准定义的结构的程序)。

Segmentation faults are defined by a different standard, POSIX .分段错误由不同的标准POSIX定义。 This program is guaranteed to provoke either a segmentation fault, or the functionally equivalent "bus error" ( SIGBUS ), on any system that is fully conforming with POSIX.1-2008 including the Memory Protection and Advanced Realtime options, provided that the calls to sysconf , posix_memalign and mprotect succeed.该程序保证在任何完全符合 POSIX.1-2008 的系统(包括内存保护和高级实时选项)上引发分段错误或功能等效的“总线错误”( SIGBUS ),前提是调用sysconfposix_memalignmprotect成功。 My reading of C99 is that this program has implementation-defined (not undefined!) behavior considering only that standard, and therefore it is conforming but not strictly conforming .我对 C99 的解读是,该程序具有实现定义的(不是未定义的!)行为,仅考虑该标准,因此它符合但不严格符合

#define _XOPEN_SOURCE 700
#include <sys/mman.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>

int main(void)
{
    size_t pagesize = sysconf(_SC_PAGESIZE);
    if (pagesize == (size_t)-1) {
        fprintf(stderr, "sysconf: %s\n", strerror(errno));
        return 1;
    }
    void *page;
    int err = posix_memalign(&page, pagesize, pagesize);
    if (err || !page) {
        fprintf(stderr, "posix_memalign: %s\n", strerror(err));
        return 1;
    }
    if (mprotect(page, pagesize, PROT_NONE)) {
        fprintf(stderr, "mprotect: %s\n", strerror(errno));
        return 1;
    }
    *(long *)page = 0xDEADBEEF;
    return 0;
}

It's hard to define a method to segmentation fault a program on undefined platforms.在未定义的平台上很难定义一种对程序进行分段错误的方法。 A segmentation fault is a loose term that is not defined for all platforms (eg. simple small computers).分段错误是一个松散的术语,并未针对所有平台(例如简单的小型计算机)定义。

Considering only the operating systems that support processes , processes can receive notification that a segmentation fault occurred.仅考虑支持进程的操作系统,进程可以接收到发生分段错误的通知。

Further, limiting operating systems to 'unix like' OSes, a reliable method for a process to receive a SIGSEGV signal is kill(getpid(),SIGSEGV)此外,将操作系统限制为“类 unix”操作系统,进程接收 SIGSEGV 信号的可靠方法是kill(getpid(),SIGSEGV)

As is the case in most cross platform problems, each platform may (an usually does) have a different definition of seg-faulting.与大多数跨平台问题的情况一样,每个平台可能(通常会)有不同的 seg-faulting 定义。

But to be practical, current mac, lin and win OSes will segfault on但实际上,当前的 mac、lin 和 win 操作系统会出现 segfault on

*(int*)0 = 0;

Further, it's not bad behaviour to cause a segfault.此外,引起段错误也不是坏行为。 Some implementations of assert() cause a SIGSEGV signal which might produce a core file. assert()的一些实现会导致一个 SIGSEGV 信号,该信号可能会产生一个核心文件。 Very useful when you need to autopsy.当您需要尸检时非常有用。

What's worse than causing a segfault is hiding it:比导致段错误更糟糕的是隐藏它:

try
{
     anyfunc();
}
catch (...) 
{
     printf("?\n");
}

which hides the origin of an error and all you've got to go on is:它隐藏了错误的根源,你所要做的就是:

?

. .

Here's another way I haven't seen mentioned here:这是我在这里没有提到的另一种方式:

int main() {
    void (*f)(void);
    f();
}

In this case f is an uninitialized function pointer, which causes a segmentation fault when you try to call it.在这种情况下, f是一个未初始化的函数指针,当您尝试调用它时会导致分段错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM