Valgrind堆栈完全错过了一个函数

Question

我有两个c文件：

AC

void main(){
    ...
    getvtable()->function();
}

vtable指向位于bc中的函数：

void function(){
    malloc(42);
}

现在，如果我在valgrind中跟踪程序，我会得到以下结果：

==29994== 4,155 bytes in 831 blocks are definitely lost in loss record 26 of 28
==29994==    at 0x402CB7A: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==29994==    by 0x40A24D2: (below main) (libc-start.c:226)

所以对函数的调用在堆栈上完全被省略了！ 这怎么可能？ 如果我使用GDB，则会显示包含“function”的正确堆栈。

包含调试符号，Linux，32位。

UPD：

回答第一个问题，我在调试valgrind的GDB服务器时得到以下输出。 断点不会到来，而当我直接使用GDB进行调试时。

stasik@gemini:~$ gdb -q
(gdb) set confirm off
(gdb) target remote | vgdb
Remote debugging using | vgdb
relaying data between gdb and process 11665
[Switching to Thread 11665]
0x040011d0 in ?? ()
(gdb) file /home/stasik/leak.so
Reading symbols from /home/stasik/leak.so...done.
(gdb) break function
Breakpoint 1 at 0x110c: file ../../source/leakclass.c, line 32.
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>silent
>end
(gdb) continue
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x0404efcb in ?? ()
(gdb) source thread-frames.py
Stack level 0, frame at 0x42348a0:
 eip = 0x404efcb; saved eip 0x4f2f544c
 called by frame at 0x42348a4
 Arglist at 0x4234898, args:
 Locals at 0x4234898, Previous frame's sp is 0x42348a0
 Saved registers:
  ebp at 0x4234898, eip at 0x423489c
Stack level 1, frame at 0x42348a4:
 eip = 0x4f2f544c; saved eip 0x6e492056
 called by frame at 0x42348a8, caller of frame at 0x42348a0
 Arglist at 0x423489c, args:
 Locals at 0x423489c, Previous frame's sp is 0x42348a4
 Saved registers:
  eip at 0x42348a0
Stack level 2, frame at 0x42348a8:
 eip = 0x6e492056; saved eip 0x205d6f66
 called by frame at 0x42348ac, caller of frame at 0x42348a4
 Arglist at 0x42348a0, args:
 Locals at 0x42348a0, Previous frame's sp is 0x42348a8
 Saved registers:
  eip at 0x42348a4
Stack level 3, frame at 0x42348ac:
 eip = 0x205d6f66; saved eip 0x61746144
---Type <return> to continue, or q <return> to quit---
 called by frame at 0x42348b0, caller of frame at 0x42348a8
 Arglist at 0x42348a4, args:
 Locals at 0x42348a4, Previous frame's sp is 0x42348ac
 Saved registers:
  eip at 0x42348a8
Stack level 4, frame at 0x42348b0:
 eip = 0x61746144; saved eip 0x65736162
 called by frame at 0x42348b4, caller of frame at 0x42348ac
 Arglist at 0x42348a8, args:
 Locals at 0x42348a8, Previous frame's sp is 0x42348b0
 Saved registers:
  eip at 0x42348ac
Stack level 5, frame at 0x42348b4:
 eip = 0x65736162; saved eip 0x70616d20
 called by frame at 0x42348b8, caller of frame at 0x42348b0
 Arglist at 0x42348ac, args:
 Locals at 0x42348ac, Previous frame's sp is 0x42348b4
 Saved registers:
  eip at 0x42348b0
Stack level 6, frame at 0x42348b8:
 eip = 0x70616d20; saved eip 0x2e646570
 called by frame at 0x42348bc, caller of frame at 0x42348b4
 Arglist at 0x42348b0, args:
---Type <return> to continue, or q <return> to quit---
 Locals at 0x42348b0, Previous frame's sp is 0x42348b8
 Saved registers:
  eip at 0x42348b4
Stack level 7, frame at 0x42348bc:
 eip = 0x2e646570; saved eip 0x0
 called by frame at 0x42348c0, caller of frame at 0x42348b8
 Arglist at 0x42348b4, args:
 Locals at 0x42348b4, Previous frame's sp is 0x42348bc
 Saved registers:
  eip at 0x42348b8
Stack level 8, frame at 0x42348c0:
 eip = 0x0; saved eip 0x0
 caller of frame at 0x42348bc
 Arglist at 0x42348b8, args:
 Locals at 0x42348b8, Previous frame's sp is 0x42348c0
 Saved registers:
  eip at 0x42348bc
(gdb) continue
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x0404efcb in ?? ()
(gdb) continue
Continuing.

Answer 1

我看到两个可能的原因：

Valgrind正在使用与GDB不同的堆栈展开方法
在两个环境下运行程序时，地址空间布局不同，而您只在Valgrind下遇到堆栈损坏。

我们可以通过使用Valgrind的内置gdbserver获得更多洞察力。

将此Python代码段保存到thread-frames.py

import gdb

f = gdb.newest_frame()
while f is not None:
    f.select()
    gdb.execute('info frame')
    f = f.older()

t.gdb

set confirm off
file MY-PROGRAM
break function
commands
silent
end
run
source thread-frames.py
quit

v.gdb

set confirm off
target remote | vgdb
file MY-PROGRAM
break function
commands
silent
end
continue
source thread-frames.py
quit

（根据需要更改MY-PROGRAM ，上述脚本中的功能和下面的命令）

获取GDB下的堆栈帧的详细信息：

$ gdb -q -x t.gdb
Breakpoint 1 at 0x80484a2: file valgrind-unwind.c, line 6.
Stack level 0, frame at 0xbffff2f0:
 eip = 0x80484a2 in function (valgrind-unwind.c:6); saved eip 0x8048384
 called by frame at 0xbffff310
 source language c.
 Arglist at 0xbffff2e8, args: 
 Locals at 0xbffff2e8, Previous frame's sp is 0xbffff2f0
 Saved registers:
  ebp at 0xbffff2e8, eip at 0xbffff2ec
Stack level 1, frame at 0xbffff310:
 eip = 0x8048384 in main (valgrind-unwind.c:17); saved eip 0xb7e33963
 caller of frame at 0xbffff2f0
 source language c.
 Arglist at 0xbffff2f8, args: 
 Locals at 0xbffff2f8, Previous frame's sp is 0xbffff310
 Saved registers:
  ebp at 0xbffff2f8, eip at 0xbffff30c

在Valgrind下获取相同的数据：

$ valgrind --vgdb=full --vgdb-error=0 ./MY-PROGRAM

在另一个shell中：

$ gdb -q -x v.gdb
relaying data between gdb and process 574
0x04001020 in ?? ()
Breakpoint 1 at 0x80484a2: file valgrind-unwind.c, line 6.
Stack level 0, frame at 0xbe88e2c0:
 eip = 0x80484a2 in function (valgrind-unwind.c:6); saved eip 0x8048384
 called by frame at 0xbe88e2e0
 source language c.
 Arglist at 0xbe88e2b8, args: 
 Locals at 0xbe88e2b8, Previous frame's sp is 0xbe88e2c0
 Saved registers:
  ebp at 0xbe88e2b8, eip at 0xbe88e2bc
Stack level 1, frame at 0xbe88e2e0:
 eip = 0x8048384 in main (valgrind-unwind.c:17); saved eip 0x4051963
 caller of frame at 0xbe88e2c0
 source language c.
 Arglist at 0xbe88e2c8, args: 
 Locals at 0xbe88e2c8, Previous frame's sp is 0xbe88e2e0
 Saved registers:
  ebp at 0xbe88e2c8, eip at 0xbe88e2dc

如果GDB在连接到“ valgrind --gdb ”时能够成功展开堆栈，那么Valgrind的堆栈展开算法就会出现问题。 您可以仔细检查“ 信息帧 ”输出是否为内联和尾调用帧或其他可能导致Valgrind关闭的原因。 否则它可能是堆栈损坏。

Answer 2

好的，用一个显式的-O0编译所有.so部分和主程序似乎解决了这个问题。 似乎加载.so的'核心'程序的一些优化（所以总是被编译为未优化的）正在打破堆栈。

Answer 3

这是Tail-call优化的实际应用。

函数function调用malloc作为它做的最后一件事。 编译器在调用malloc 之前会看到这个并杀死function的堆栈帧。 优点是当malloc返回时，它直接返回到函数的任何function 。 即它避免malloc返回function只是为了击中另一个返回指令。

在这种情况下，优化已经阻止了不必要的跳转并使堆栈使用稍微更有效，这很好，但是在递归尾调用的情况下，这种优化是一个巨大的胜利，因为它将递归变成更像迭代的东西。

正如您已经发现的那样，禁用优化使调试变得更加容易。 如果你想调试优化代码（也许是为了性能测试），那么，正如@Zang MingJie已经说过的那样，你可以用-fno-optimize-sibling-calls禁用这一优化。

Valgrind堆栈完全错过了一个函数

问题描述

3 个解决方案

解决方案1
5 2013-05-28 20:26:11

解决方案2
5 已采纳 2013-06-02 10:13:06

解决方案3
2 2013-06-06 14:20:46

Valgrind堆栈完全错过了一个函数

问题描述

3 个解决方案

解决方案1 5 2013-05-28 20:26:11

解决方案2 5 已采纳 2013-06-02 10:13:06

解决方案3 2 2013-06-06 14:20:46

解决方案1
5 2013-05-28 20:26:11

解决方案2
5 已采纳 2013-06-02 10:13:06

解决方案3
2 2013-06-06 14:20:46