简体   繁体   English

如何在GDB中为open(2)syscall返回-1设置断点

[英]How can I set breakpoint in GDB for open(2) syscall returning -1

OS: GNU/Linux 操作系统:GNU / Linux
Distro: OpenSuSe 13.1 发行版:OpenSuSe 13.1
Arch: x86-64 Arch:x86-64
GDB version: 7.6.50.20130731-cvs GDB版本:7.6.50.20130731-cvs
Program language: mostly C with minor bits of assembly 程序语言:主要是带有少量汇编的C语言

Imagine that I've got rather big program that sometimes fails to open a file. 想象一下,我有一个相当大的程序,有时无法打开文件。 Is it possible to set breakpoint in GDB in such way that it stops after open(2) syscall returns -1? 是否可以在open(2)系统调用返回-1后停止在GDB中设置断点?

Of course, I can grep through the source code and find all open(2) invocations and narrow down the faulting open() call but maybe there's a better way. 当然,我可以通过源代码grep并找到所有open(2)调用并缩小错误的open()调用,但也许有更好的方法。

I tried to use "catch syscall open" then "condition N if $rax==-1" but obviously it didn't get hit. 我试图使用"catch syscall open"然后"condition N if $rax==-1"但显然它没有被击中。
BTW, Is it possible to distinct between a call to syscall (eg open(2) ) and return from syscall (eg open(2) ) in GDB? 顺便说一句,是否有可能的通话之间不同的系统调用来(例如, open(2)并从系统调用返回(例如, open(2)在GDB?

As a current workaround I do the following: 作为当前的解决方法,我执行以下操作:

  1. Run the program in question under the GDB 在GDB下运行有问题的程序
  2. From another terminal launch systemtap script: 从另一个终端启动systemtap脚本:

     stap -g -v -e 'probe process("PATH to the program run under GDB").syscall.return { if( $syscall == 2 && $return <0) raise(%{ SIGSTOP %}) }' 
  3. After open(2) returns -1 I receive SIGSTOP in GDB session and I can debug the issue. open(2)返回-1后,我在GDB会话中收到SIGSTOP,我可以调试该问题。

TIA. TIA。

Best regards, 最好的祝福,
alexz. alexz。

UPD: Even though I tried the approach suggested by nm before and wasn't able to make it work I decided to give it another try. UPD:即使我之前尝试过nm建议的方法并且无法使其工作,我还是决定再试一次。 After 2 hours it now works as intended. 2小时后,它现在按预期工作。 But with some weird workaround: 但有一些奇怪的解决方法:

  1. I still can't distinct between call and return from syscall 我仍然无法区分调用和从syscall返回
  2. If I use finish in comm I can't use continue , which is OK according to GDB docs 如果我在comm使用finish ,我就不能使用continue ,根据GDB文档,这是可以的
    ie the following does drop to gdb prompt on each break: 即每次休息时,以下内容都会降至gdb提示符:

     gdb> comm gdb> finish gdb> printf "rax is %d\\n",$rax gdb> cont gdb> end 
  3. Actually I can avoid using finish and check %rax in commands but in this case I have to check for -errno rather than -1 eg if it's "Permission denied" then I have to check for "-13" and if it's "No such file or direcory" - then for -2. 实际上我可以避免使用finish并在commands检查%rax但在这种情况下我必须检查-errno而不是-1,例如,如果它是“Permission denied”然后我必须检查“-13”,如果它是“没有这样的文件或目录“ - 然后为-2。 It's just simply not right 这只是不对

  4. So the only way to make it work for me was to define custom function and use it in the following way: 因此,让它适用于我的唯一方法是定义自定义函数并以下列方式使用它:

     (gdb) catch syscall open Catchpoint 1 (syscall 'open' [2] (gdb) define mycheck Type commands for definition of "mycheck". End with a line saying just "end". >finish >finish >if ($rax != -1) >cont >end >printf "rax is %d\\n",$rax >end (gdb) comm Type commands for breakpoint(s) 1, one per line. End with a line saying just "end". >mycheck >end (gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /home/alexz/gdb_syscall_test/main ..... Catchpoint 1 (returned from syscall open), 0x00007ffff7b093f0 in __open_nocancel () from /lib64/libc.so.6 0x0000000000400756 in main (argc=1, argv=0x7fffffffdb18) at main.c:24 24 fd = open(filenames[i], O_RDONLY); Opening test1 fd = 3 (0x3) Successfully opened test1 Catchpoint 1 (call to syscall open), 0x00007ffff7b093f0 in __open_nocancel () from /lib64/libc.so.6 rax is -38 Catchpoint 1 (returned from syscall open), 0x00007ffff7b093f0 in __open_nocancel () from /lib64/libc.so.6 0x0000000000400756 in main (argc=1, argv=0x7fffffffdb18) at main.c:24 ---Type <return> to continue, or q <return> to quit--- 24 fd = open(filenames[i], O_RDONLY); rax is -1 (gdb) bt #0 0x0000000000400756 in main (argc=1, argv=0x7fffffffdb18) at main.c:24 (gdb) step 26 printf("Opening %s\\n", filenames[i]); (gdb) info locals i = 1 fd = -1 

This gdb script does what's requested: 这个gdb脚本执行了所请求的内容:

set $outside = 1
catch syscall open
commands
  silent
  set $outside = ! $outside
  if ( $outside && $rax >= 0)
    continue
  end
  if ( !$outside )
    continue
  end
  echo `open' returned a negative value\n
end

The $outside variable is needed because gdb stops both at syscall enter and syscall exit. $outside变量是必需的,因为gdb在syscall enter和syscall exit处都停止。 We need to ignore enter events and check $rax only at exit. 我们需要忽略输入事件并仅在退出时检查$rax

Is it possible to set breakpoint in GDB in such way that it stops after open(2) syscall returns -1? 是否可以在打开(2)系统调用返回-1后停止在GDB中设置断点?

It's hard to do better than nm s answer for this narrow question, but I would argue that the question is posed incorrectly. 对于这个狭隘的问题,很难比nm的答案做得更好,但我认为这个问题是错误的。

Of course, I can grep through the source code and find all open(2) invocations 当然,我可以通过源代码grep并找到所有open(2)调用

That is part of your confusion: when you call open in a C program, you are not in fact executing open(2) system call. 这是你困惑的一部分:当你在C程序中调用open时,你实际上并没有执行open(2)系统调用。 Rather, you are invoking an open(3) "stub" from your libc, and that stub will execute the open(2) system call for you. 而是从libc调用open(3) “stub”,该stub将为您执行open(2)系统调用。

And if you want to set a breakpoint when the stub is about to return -1 , that is very easy. 如果你想在存根即将返回-1时设置一个断点,这很容易。

Example: 例:

/* t.c */
#include <sys/stat.h>
#include <fcntl.h>

int main()
{
  int fd = open("/no/such/file", O_RDONLY);
  return fd == -1 ? 0 : 1;
}

$ gcc -g t.c; gdb -q ./a.out
(gdb) start
Temporary breakpoint 1 at 0x4004fc: file t.c, line 6.
Starting program: /tmp/a.out

Temporary breakpoint 1, main () at t.c:6
6     int fd = open("/no/such/file", O_RDONLY);
(gdb) s
open64 () at ../sysdeps/unix/syscall-template.S:82
82  ../sysdeps/unix/syscall-template.S: No such file or directory.

Here we've reached the glibc system call stub. 在这里,我们已经达到了glibc系统调用存根。 Let's disassemble it: 让我们拆解它:

(gdb) disas
Dump of assembler code for function open64:
=> 0x00007ffff7b01d00 <+0>: cmpl   $0x0,0x2d74ad(%rip)        # 0x7ffff7dd91b4 <__libc_multiple_threads>
   0x00007ffff7b01d07 <+7>: jne    0x7ffff7b01d19 <open64+25>
   0x00007ffff7b01d09 <+0>: mov    $0x2,%eax
   0x00007ffff7b01d0e <+5>: syscall
   0x00007ffff7b01d10 <+7>: cmp    $0xfffffffffffff001,%rax
   0x00007ffff7b01d16 <+13>:    jae    0x7ffff7b01d49 <open64+73>
   0x00007ffff7b01d18 <+15>:    retq
   0x00007ffff7b01d19 <+25>:    sub    $0x8,%rsp
   0x00007ffff7b01d1d <+29>:    callq  0x7ffff7b1d050 <__libc_enable_asynccancel>
   0x00007ffff7b01d22 <+34>:    mov    %rax,(%rsp)
   0x00007ffff7b01d26 <+38>:    mov    $0x2,%eax
   0x00007ffff7b01d2b <+43>:    syscall
   0x00007ffff7b01d2d <+45>:    mov    (%rsp),%rdi
   0x00007ffff7b01d31 <+49>:    mov    %rax,%rdx
   0x00007ffff7b01d34 <+52>:    callq  0x7ffff7b1d0b0 <__libc_disable_asynccancel>
   0x00007ffff7b01d39 <+57>:    mov    %rdx,%rax
   0x00007ffff7b01d3c <+60>:    add    $0x8,%rsp
   0x00007ffff7b01d40 <+64>:    cmp    $0xfffffffffffff001,%rax
   0x00007ffff7b01d46 <+70>:    jae    0x7ffff7b01d49 <open64+73>
   0x00007ffff7b01d48 <+72>:    retq
   0x00007ffff7b01d49 <+73>:    mov    0x2d10d0(%rip),%rcx        # 0x7ffff7dd2e20
   0x00007ffff7b01d50 <+80>:    xor    %edx,%edx
   0x00007ffff7b01d52 <+82>:    sub    %rax,%rdx
   0x00007ffff7b01d55 <+85>:    mov    %edx,%fs:(%rcx)
   0x00007ffff7b01d58 <+88>:    or     $0xffffffffffffffff,%rax
   0x00007ffff7b01d5c <+92>:    jmp    0x7ffff7b01d48 <open64+72>
End of assembler dump.

Here you can see that the stub behaves differently depending on whether the program has multiple threads or not. 在这里,您可以看到存根的行为有所不同,具体取决于程序是否具有多个线程。 This has to do with asynchronous cancellation. 这与异步取消有关。

There are two syscall instructions, and in the general case we'd need to set a breakpoint after each one (but see below). 有两个系统调用指令,在一般情况下,我们需要在每个指令之后设置一个断点(但见下文)。

But this example is single-threaded, so I can set a single conditional breakpoint: 但是这个例子是单线程的,所以我可以设置一个条件断点:

(gdb) b *0x00007ffff7b01d10 if $rax < 0
Breakpoint 2 at 0x7ffff7b01d10: file ../sysdeps/unix/syscall-template.S, line 82.
(gdb) c
Continuing.

Breakpoint 2, 0x00007ffff7b01d10 in __open_nocancel () at ../sysdeps/unix/syscall-template.S:82
82  in ../sysdeps/unix/syscall-template.S
(gdb) p $rax
$1 = -2

Voila, the open(2) system call returned -2 , which the stub will translate into setting errno to ENOENT (which is 2 on this system) and returning -1 . Voila, open(2)系统调用返回-2 ,存根将转换为将errnoENOENT (在此系统上为2)并返回-1

If the open(2) succeeded, the condition $rax < 0 would be false, and GDB will keep going. 如果open(2)成功,条件$rax < 0将为false,GDB将继续运行。

That is precisely the behavior one usually wants from GDB when looking for one failing system call among many succeeding ones. 这正是GDB在许多后续系统中寻找一个失败的系统调用时通常需要的行为。

Update: 更新:

As Chris Dodd points out, there are two syscalls, but on error they both branch to the same error-handling code (the code that sets errno ). 正如Chris Dodd所指出的那样,有两个系统调用,但是在出错时它们都会转移到相同的错误处理代码(设置errno的代码)。 Thus, we can set an un-conditional breakpoint on *0x00007ffff7b01d49 , and that breakpoint will fire only on failure. 因此,我们可以在*0x00007ffff7b01d49上设置一个非条件断点,该断点仅在失败时触发。

This is much better, because conditional breakpoints slow down execution quite a lot when the condition is false (GDB has to stop the inferior, evaluate the condition, and resume the inferior if the condition is false). 这要好得多,因为当条件为假时,条件断点会大大减慢执行速度(GDB必须停止下级,评估条件,如果条件为假则恢复劣势)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM