简体   繁体   中英

How to debug nondeterministic memory corruption?

I have a nondeterministic memory corruption problem. Because it's not always the same address, and it occurs only rarely, I can't simply watchpoint it with gdb .

The problem is a value changes between point A and point B in my program. The only thing that is supposed to change it is point C, which does not run in that time (at least not for the specific instance that experiences the unexpected modification).

What I'd like to do is something like mprotect the value at point A so the machine will trap if it is modified and unprotected it again around the intentional modification at point C. Of course, mprotect is not meant to be taken literally as I need it to work with word granularity.

Simply watchpointing at point A manually with gdb is far too much toil, the frequency of the problem is only about one per thousand.

Ideally, I would like a stack trace at the point that modifies it.

Any ideas?

Update: I just found out about rr http://rr-project.org/ , a tool that can allegedly "determinize" non-determinism problems. I'm going to give it a go.

Update2: Well that was a short trip:

[FATAL /build/rr-jR8ti5/rr-4.1.0/src/PerfCounters.cc:167:init_attributes() errno: 0 'Success'] 
 -> Microarchitecture `Intel Merom' currently unsupported.

You are experiencing undefined behavior and it's being caused somewhere else, debugging this is really hard.

Since you are apparently on Linux, use valgrind and it will help you a lot. If you are not on Linux or ( OS X which is also supported by valgrind ), search for equivalent memory error detection software for your system.

I found that it isn't that difficult to script gdb in a scripting language that you know (in my case, Ruby ). This cuts down on the need to learn how to make proper gdb scripts!

The API between the target program and the script is that the target program has a blank function called my_breakpoint that accepts a single machine word as an argument. Calling my_breakpoint(1); my_breakpoint(addr); my_breakpoint(1); my_breakpoint(addr); adds an address to the watch list while the same thing with the constant 2 removes an address from the watch list.

To use this, you need to start gdbserver 127.0.0.1:7117 myapp myargs , and then launch the following script. When the script detects a problem, it disconnects cleanly from gdbserver so that you can reconnect another instance of gdb with gdb -ex 'target remote 127.0.0.1:7117' and off you go.

Note that it's extremely slow to use software watchpoints like this; maybe someday something like this can implemented as valgrind tool.

#!/usr/bin/env ruby

system("rm -f /tmp/gdb_i /tmp/gdb_o");
system("mkfifo /tmp/gdb_i /tmp/gdb_o");
system("killall -w gdb");
system("gdb -ex 'target remote 127.0.0.1:7117' </tmp/gdb_i >/tmp/gdb_o &");

$fo = File.open("/tmp/gdb_i", "wb");
$fi = File.open("/tmp/gdb_o", "rb");

def gdb_put(l)
  $stderr.puts("gdb_out: #{l}");
  $fo.write((l + "\n"));
  $fo.flush;
end

gdb_put("b my_breakpoint");
gdb_put("set can-use-hw-watchpoints 0");
gdb_put("c");

$state = 0;
$watchpoint_ctr = 1; # start at 1 so the 1st watchpoint gets 2, etc. this is because the breakpoint gets 1.
$watchpoint_nr = {};

def gdb_got_my_breakpoint(x)
  $stderr.puts("my_breakpoint #{x}");

  if ((x == 1) || (x == 2))
    raise if ($state != 0);
    $state = x;
    gdb_put("c");
  else
    if ($state == 1)
      raise if ($watchpoint_nr[x].nil?.!);
      $watchpoint_nr[x] = ($watchpoint_ctr += 1);
      gdb_put("watch *#{x}");
    elsif ($state == 2)
      nr = $watchpoint_nr[x];
      if (nr.nil?)
        $stderr.puts("WARNING: ignoring delete request for watchpoint #{x} not previously established");
      else
        gdb_put("delete #{nr}");
        $watchpoint_nr.delete(x);
      end
    end
    $state = 0;
    gdb_put("info breakpoints");
    $stderr.puts("INFO: my current notion: #{$watchpoint_nr}");
    gdb_put("c");
  end
end

def gdb_got(l)
  t = l.split;

  if ((t[0] == "Breakpoint") && (t[2] == "my_breakpoint"))
    gdb_got_my_breakpoint(t[3][3..-2].to_i);
  end

  if (l.start_with?("Program received signal ") || l.start_with?("Watchpoint "))
    gdb_put("disconnect");
    gdb_put("q");
    sleep;
  end
end

while (l = $fi.gets)
  l = l.strip;

  $stderr.puts("gdb_inp: #{l}");

  gdb_got(l);
end

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM