[英]How to fix the segmentation fault?
(Edit: I have just fixed the getpid
cache problem and rerun gdb
and valgrind
.) (编辑:我刚刚修复了getpid
缓存问题并重新运行了gdb
和valgrind
。)
(Edit: I just increase the size of stack for child from 200
bytes to 2000
bytes.) (编辑:我只是将 child 的堆栈大小从200
字节增加到2000
字节。)
I wrote the following program to learn how to use clone
with CLONE_VM | CLONE_VFORK | CLONE_PARENT
我编写了以下程序来学习如何在CLONE_VM | CLONE_VFORK | CLONE_PARENT
使用clone
CLONE_VM | CLONE_VFORK | CLONE_PARENT
CLONE_VM | CLONE_VFORK | CLONE_PARENT
CLONE_VM | CLONE_VFORK | CLONE_PARENT
on linux
x86-64
machine: CLONE_VM | CLONE_VFORK | CLONE_PARENT
在linux
x86-64
机器上:
// test.c
#define _GNU_SOURCE
#include <stdio.h>
#include <assert.h>
#include <syscall.h> // For syscall to call getpid
#include <signal.h> // For SIGCHILD
#include <sys/types.h>// For getppid
#include <unistd.h> // For getppid and sleep
#include <sched.h> // For clone
#include <stdlib.h> // For calloc and free
#define STACK_SIZE 2000
void Puts(const char *str)
{
assert(fputs(str, stderr) != EOF);
}
void Sleep(unsigned int sec)
{
do {
sec = sleep(sec);
} while(sec > 0);
}
int child(void *useless)
{
Puts("The new process is created.\n");
assert(fprintf(stderr, "pid = %d, ppid = %d\n", (pid_t) syscall(SYS_getpid), getppid()) > 0);
Puts("sleep for 120 secs\n");
Sleep(120);
return 0;
}
int main(int argc, char* argv[])
{
Puts("Allocate stack for new process\n");
void *stack = calloc(STACK_SIZE, sizeof(char));
void *stack_top = (void*) ((char*) stack + STACK_SIZE - 1);
assert(fprintf(stderr, "stack = %p, stack top = %p\n", stack, stack_top) > 0);
Puts("clone\n");
int ret = clone(child, stack_top, CLONE_VM | CLONE_VFORK | CLONE_PARENT | SIGCHLD, NULL);
Puts("clone returns\n");
Puts("Free the stack\n");
free(stack);
if (ret == -1)
perror("clone(child, stack, CLONE_VM | CLONE_VFORK, NULL)");
else {
ret = 0;
Puts("Child dies...\n");
}
return ret;
}
I compiled the program using clang-7 test.c
and ran it ./a.out
in bash
.我使用clang-7 test.c
编译程序并在bash
运行它./a.out
。 It returned instantly with the following output:它立即返回以下输出:
Allocate stack for new process
stack = 0x492260, stack top = 0x492a2f
clone
The new process is created.
Segmentation fault
And it returns 139
meaning signal SIGSEGV
is sent to my process.它返回139
意味着信号SIGSEGV
被发送到我的进程。
Then I recompiled it using -g
and use valgrind --trace-children=yes ./a.out
to debug it:然后我使用-g
重新编译它并使用valgrind --trace-children=yes ./a.out
来调试它:
|| ==14494== Memcheck, a memory error detector
|| ==14494== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
|| ==14494== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
|| ==14494== Command: ./a.out
|| ==14494==
|| Allocate stack for new process
|| stack = 0x51f3040, stack top = 0x51f380f
|| clone
|| clone returns
|| Free the stack
|| Child dies...
|| ==14495== Invalid write of size 4
|| ==14495== at 0x201322: ??? (in /home/nobodyxu/a.out)
|| ==14495== by 0x4F2FCBE: clone (clone.S:95)
|| ==14495== Address 0xffffffffffffffdc is not stack'd, malloc'd or (recently) free'd
|| ==14495==
|| ==14495==
|| ==14495== Process terminating with default action of signal 11 (SIGSEGV)
|| ==14495== Access not within mapped region at address 0xFFFFFFFFFFFFFFDC
|| ==14495== at 0x201322: ??? (in /home/nobodyxu/a.out)
|| ==14495== by 0x4F2FCBE: clone (clone.S:95)
|| ==14495== If you believe this happened as a result of a stack
|| ==14495== overflow in your program's main thread (unlikely but
|| ==14495== possible), you can try to increase the size of the
|| ==14495== main thread stack using the --main-stacksize= flag.
|| ==14495== The main thread stack size used in this run was 8388608.
|| ==14495==
|| ==14495== HEAP SUMMARY:
|| ==14495== in use at exit: 2,000 bytes in 1 blocks
|| ==14495== total heap usage: 1 allocs, 0 frees, 2,000 bytes allocated
|| ==14495==
|| ==14495== LEAK SUMMARY:
|| ==14495== definitely lost: 0 bytes in 0 blocks
|| ==14495== indirectly lost: 0 bytes in 0 blocks
|| ==14495== possibly lost: 0 bytes in 0 blocks
|| ==14495== still reachable: 2,000 bytes in 1 blocks
|| ==14495== suppressed: 0 bytes in 0 blocks
|| ==14495== Rerun with --leak-check=full to see details of leaked memory
|| ==14495==
|| ==14495== For counts of detected and suppressed errors, rerun with: -v
|| ==14495== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
|| ==14494==
|| ==14494== HEAP SUMMARY:
|| ==14494== in use at exit: 0 bytes in 0 blocks
|| ==14494== total heap usage: 1 allocs, 1 frees, 2,000 bytes allocated
|| ==14494==
|| ==14494== All heap blocks were freed -- no leaks are possible
|| ==14494==
|| ==14494== For counts of detected and suppressed errors, rerun with: -v
|| ==14494== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
It also returned instantly and printed these.它也立即返回并打印了这些。
I checked the generated assembly for 0x201322
and found out that it belongs to int main(int argc, char* argv[])
:我检查了0x201322
生成的程序集,发现它属于int main(int argc, char* argv[])
:
|| 20131d: e8 8e 01 00 00 callq 2014b0 <clone@plt>
|| 201322: 89 45 dc mov %eax,-0x24(%rbp)
|| 201325: 48 bf 54 09 20 00 00 movabs $0x200954,%rdi
|| 20132c: 00 00 00
|| 20132f: e8 dc fd ff ff callq 201110 <Puts>
|| 201334: 48 bf ad 08 20 00 00 movabs $0x2008ad,%rdi
|| 20133b: 00 00 00
I also tried to use set follow-fork-mode child
in gdb
to debug it, but this doesn't work.我也尝试在gdb
使用set follow-fork-mode child
来调试它,但这不起作用。
How to fix the segmentation fault?如何修复分段错误?
The function printf and fprintf seem to be not thread safe without various guard rails.如果没有各种防护措施,函数 printf 和 fprintf 似乎不是线程安全的。 This is detailed in segfault with clone() and printf .这在带有 clone() 和 printf 的段错误中有详细说明。
I found the problem by the brute force method of noting where the last print happened, and then commenting out lines after that until the error went away.我通过蛮力方法发现了问题,即记录上次打印发生的位置,然后注释掉之后的行,直到错误消失。
This segfault might be specific to glibc.此段错误可能特定于 glibc。 I build this code snippet with musl libc, and it works fine.我用 musl libc 构建了这个代码片段,它工作正常。 It doesn't seem like this is related to the thread-safety of fprintf
either because clone
is passed with CLONE_VFORK
, which suspends the parent process.这似乎与fprintf
的线程安全性无关,因为clone
是通过CLONE_VFORK
传递的,它暂停了父进程。
I use gdb to debug your program.我使用 gdb 来调试你的程序。 The error messages are as follows.错误信息如下。
The stack you applied for the child may have been released before the fprintf is real execution in the child function.你为child申请的栈可能在fprintf在子函数中真正执行之前就已经释放了。
In the child function, add fflush(stdout);
在fflush(stdout);
,添加fflush(stdout);
after the assert may solve your problem.在断言之后可能会解决您的问题。
Continuing.
Allocate stack for new process
stack = 0x602010, stack top = 0x6027df
clone
The new process is created.
sleep for 20 secs
clone returns
Free the stack
*** Error in `test': double free or corruption (out): 0x0000000000602010 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7ffff7a847e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7ffff7a8d37a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7ffff7a9153c]
/***/***/tmp/test[0x400969]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7ffff7a2d830]
/***/***/tmp/test[0x400729]
======= Memory map: ========
00400000-00401000 r-xp 00000000 08:21 12848672 /***/***/tmp/test
00600000-00601000 r--p 00000000 08:21 12848672 /***/***/tmp/test
00601000-00602000 rw-p 00001000 08:21 12848672 /***/***/tmp/test
00602000-00623000 rw-p 00000000 00:00 0 [heap]
7ffff0000000-7ffff0021000 rw-p 00000000 00:00 0
7ffff0021000-7ffff4000000 ---p 00000000 00:00 0
7ffff77f7000-7ffff780d000 r-xp 00000000 08:01 786957 /lib/x86_64-linux-gnu/libgcc_s.so.1
7ffff780d000-7ffff7a0c000 ---p 00016000 08:01 786957 /lib/x86_64-linux-gnu/libgcc_s.so.1
7ffff7a0c000-7ffff7a0d000 rw-p 00015000 08:01 786957 /lib/x86_64-linux-gnu/libgcc_s.so.1
7ffff7a0d000-7ffff7bcd000 r-xp 00000000 08:01 791529 /lib/x86_64-linux-gnu/libc-2.23.so
7ffff7bcd000-7ffff7dcd000 ---p 001c0000 08:01 791529 /lib/x86_64-linux-gnu/libc-2.23.so
7ffff7dcd000-7ffff7dd1000 r--p 001c0000 08:01 791529 /lib/x86_64-linux-gnu/libc-2.23.so
7ffff7dd1000-7ffff7dd3000 rw-p 001c4000 08:01 791529 /lib/x86_64-linux-gnu/libc-2.23.so
7ffff7dd3000-7ffff7dd7000 rw-p 00000000 00:00 0
7ffff7dd7000-7ffff7dfd000 r-xp 00000000 08:01 791311 /lib/x86_64-linux-gnu/ld-2.23.so
7ffff7fd3000-7ffff7fd6000 rw-p 00000000 00:00 0
7ffff7ff7000-7ffff7ff8000 rw-p 00000000 00:00 0
7ffff7ff8000-7ffff7ffa000 r--p 00000000 00:00 0 [vvar]
7ffff7ffa000-7ffff7ffc000 r-xp 00000000 00:00 0 [vdso]
7ffff7ffc000-7ffff7ffd000 r--p 00025000 08:01 791311 /lib/x86_64-linux-gnu/ld-2.23.so
7ffff7ffd000-7ffff7ffe000 rw-p 00026000 08:01 791311 /lib/x86_64-linux-gnu/ld-2.23.so
7ffff7ffe000-7ffff7fff000 rw-p 00000000 00:00 0
7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0 [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Program received signal SIGSEGV, Segmentation fault.
__GI_abort () at abort.c:125
125 abort.c: No such file or directory.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.