用gdb调试C程序

Question

I'm trying to test a scheduler that I wrote. 我正在尝试测试我编写的调度程序。 I schedule two processes - both are infinite while loops (just while(1) statements). 我安排了两个进程-两者都是无限的while循环（仅while（1）语句）。 When I run the program sometimes it segfaults after like ten seconds (sometimes 5 sec, sometimes 15 or more). 当我运行该程序时，有时会在十秒钟后（有时为5秒，有时为15秒或更长时间）出现段错误。 Sometimes it doesn't segfault at all and runs as expected. 有时它根本不存在段错误，并按预期运行。 I have a log file which shows me that both processes are scheduled as expected before the segfault occurs. 我有一个日志文件，该文件向我显示两个进程均在发生段错误之前按计划进行了调度。 I'm trying to debug the errors using gdb but it's not being very helpful. 我正在尝试使用gdb调试错误，但不是很有帮助。 Here's what I got with backtrace: 这是我使用回溯得到的：

#0  0x00007ffff7ff1000 in ?? ()
#1  0x000000000000002b in ?? ()
#2  0x00007ffff78b984a in new_do_write () from /lib64/libc.so.6
#3  0x000000000061e3d0 in ?? ()
#4  0x0000000000000000 in ?? ()

I don't really understand #2. 我不太了解＃2。

I think this may be a stack overflow related error. 我认为这可能是与堆栈溢出相关的错误。 However, I only malloc twice in the whole process - both times when I'm setting up the two processes, I malloc a pcb block in the pcb table I wrote. 但是，我在整个过程中只分配了两次-在两次设置两个过程时，我都会在我编写的pcb表中分配一个pcb块。 Has anyone run into similar issues before? 有人遇到过类似的问题吗？ Could this be something with how I'm setting/swapping the contexts in the scheduler? 这与我在调度程序中设置/交换上下文有关吗？ Why does it segfault sometimes, and sometimes not? 为什么有时会出现段错误，有时却不会？

Answer 1

You didn't tell how you obtained the stack trace that you show in the question. 您没有告诉您如何获得问题中显示的堆栈跟踪。

It is very likely that the stack trace is bogus not because the stack is corrupt, but because you've invoked GDB incorrectly, eg specified wrong executable when attaching the process or examining core dump. 堆栈跟踪很可能是虚假的，不是因为堆栈已损坏，而是因为您错误地调用了GDB，例如，在附加进程或检查核心转储时指定了错误的可执行文件。

One common mistake is to build the executable with -O2 (let's call this executable E1 ), then rebuild it with -g (let's call this E2 ) and try to analyze core of live process that is running E1 giving GDB E2 as the symbol file. 一个常见的错误是使用-O2生成可执行文件（我们将其称为可执行文件E1 ），然后使用-g对其进行重建（我们将其称为此E2 ），然后尝试分析运行E1的实时进程的core ，并以GDB E2作为符号文件。

Don't do that, it doesn't work and isn't expected to work. 不要那样做，它不起作用，也不应该起作用。

Answer 2

Since your stack seems corrupted, you're probably correct that you have a stack buffer overflow somewhere. 由于堆栈似乎已损坏，因此您可能更正了某个地方的堆栈缓冲区溢出。 Without the code, it's a little difficult to tell. 没有代码，很难讲。

But this has nothing to do with your malloc calls. 但这与您的malloc调用无关。 Overflowing dynamically allocated buffers would corrupt the heap, not the stack. 动态分配的缓冲区溢出将破坏堆，而不是堆栈。

Whay you'll probably need to be looking at is local variables that aren't big enough for the data you're trying to copy in to them, like: 您可能需要查看的是局部变量，对于您要复制到其中的数据而言，局部变量不够大，例如：

char xyzzy[5];
strcpy (xyzzy, "this is a bad idea";

Or passing a buffer (again, most likely on the stack) to a system call that writes more data to it than you provide for. 或将缓冲区（同样，很可能在堆栈上）传递给系统调用，该系统调用向其写入的数据超出了您的要求。

They're the most likely causes though theoretically, of course, any undefined behaviour on your part could cause this. 尽管从理论上讲，它们是最可能的原因，但是从您的角度来看，任何未定义的行为都可能导致此问题。 If the solution is not evident based on this answer, you'll probably need to post the code that caused it. 如果根据此答案不能确定解决方案，则可能需要发布导致该问题的代码。 Try to ensure you trim it down as much as possible when you do that so that it's the shortest complete program that exhibits the bug. 尝试确保在执行此操作时尽可能减少它，以便它是显示错误的最短的完整程序。

Often you'll find by doing that, the problem becomes evident :-) 通常，这样做会发现问题很明显:-)

用gdb调试C程序

问题描述

2 个解决方案

解决方案1
1 2012-11-23 02:03:02

解决方案2
0 2012-11-22 21:34:57

用gdb调试C程序

问题描述

2 个解决方案

解决方案1 1 2012-11-23 02:03:02

解决方案2 0 2012-11-22 21:34:57

解决方案1
1 2012-11-23 02:03:02

解决方案2
0 2012-11-22 21:34:57