反向调试是如何工作的？

Question

GDB has a new version out that supports reverse debug (see http://www.gnu.org/software/gdb/news/reversible.html ). GDB 有一个支持反向调试的新版本（参见http://www.gnu.org/software/gdb/news/reversible.html ）。 I got to wondering how that works.我想知道它是如何工作的。

To get reverse debug to work it seems to me that you need to store the entire machine state including memory for each step.为了让反向调试工作，在我看来你需要存储整个机器状态，包括每个步骤的内存。 This would make performance incredibly slow, not to mention using a lot of memory.这会使性能极其缓慢，更不用说使用大量内存了。 How are these problems solved?这些问题是如何解决的？

Answer 1

I'm a gdb maintainer and one of the authors of the new reverse debugging.我是 gdb 的维护者，也是新的反向调试的作者之一。 I'd be happy to talk about how it works.我很乐意谈论它是如何工作的。 As several people have speculated, you need to save enough machine state that you can restore later.正如一些人所推测的那样，您需要保存足够的机器状态，以便以后可以恢复。 There are a number of schemes, one of which is to simply save the registers or memory locations that are modified by each machine instruction.有多种方案，其中之一是简单地保存每条机器指令修改的寄存器或内存位置。 Then, to "undo" that instruction, you just revert the data in those registers or memory locations.然后，要“撤消”该指令，您只需恢复这些寄存器或内存位置中的数据。

Yes, it is expensive, but modern cpus are so fast that when you are interactive anyway (doing stepping or breakpoints), you don't really notice it that much.是的，它很贵，但现代 CPU 速度如此之快，以至于当您无论如何进行交互时（执行步进或断点），您并不会真正注意到它。

Answer 2

Note that you must not forget the use of simulators, virtual machines, and hardware recorders to implement reverse execution.注意一定不要忘记使用模拟器、虚拟机和硬件记录器来实现反向执行。

Another solution to implement it is to trace execution on physical hardware, such as is done by GreenHills and Lauterbach in their hardware-based debuggers.实现它的另一个解决方案是跟踪物理硬件上的执行，例如 GreenHills 和 Lauterbach 在他们的基于硬件的调试器中完成的。 Based on this fixed trace of the action of each instruction, you can then move to any point in the trace by removing the effects of each instruction in turn.基于每条指令动作的这个固定轨迹，你可以通过依次去除每条指令的影响来移动到轨迹中的任何点。 Note that this assumes that you can trace all things that affect the state visible in the debugger.请注意，这假设您可以跟踪影响调试器中可见状态的所有事物。

Another way is to use a checkpoint + re-execution method, which is used by VmWare Workstation 6.5 and Virtutech Simics 3.0 (and later), and which seems to be coming with Visual Studio 2010. Here, you use a virtual machine or a simulator to get a level of indirection on the execution of a system.另一种方法是使用检查点+重新执行的方法，VmWare Workstation 6.5和Virtutech Simics 3.0（及更高版本）使用的方法，并且似乎是Visual Studio 2010附带的。这里，您使用虚拟机或模拟器获得系统执行的间接级别。 You regularly dump the entire state to disk or memory, and then rely on the simulator being able to deterministically re-execute the exact same program path.您定期将整个状态转储到磁盘或内存，然后依靠模拟器能够确定性地重新执行完全相同的程序路径。

Simplified, it works like this: say that you are at time T in the execution of a system.简单地说，它的工作原理是这样的：假设您在时间 T 执行系统。 To go to time T-1, you pick up some checkpoint from point t < T, and then execute (Tt-1) cycles to end up one cycle before where you were.要进入时间 T-1，您从 t < T 点拾取某个检查点，然后执行 (Tt-1) 个循环以在您所在的位置之前结束一个循环。 This can be made to work very well, and apply even for workloads that do disk IO, consist of kernel-level code, and performs device driver work.这可以很好地工作，甚至适用于执行磁盘 IO、包含内核级代码和执行设备驱动程序工作的工作负载。 The key is to have a simulator that contains the entire target system, with all its processors, devices, memories, and IOs.关键是要有一个包含整个目标系统及其所有处理器、设备、内存和 IO 的模拟器。 See the gdb mailinglist and the discussion following that on the gdb mailing list for more details.有关更多详细信息，请参阅gdb 邮件列表和 gdb 邮件列表上的讨论。 I use this approach myself quite regularly to debug tricky code, especially in device drivers and early OS boots.我自己经常使用这种方法来调试棘手的代码，尤其是在设备驱动程序和早期操作系统启动中。

Another source of information is a Virtutech white paper on checkpointing (which I wrote, in full disclosure).另一个信息来源是关于检查点的Virtutech 白皮书（我写的，完全公开）。

Answer 3

During an EclipseCon session we also asked how they do this with the Chronon Debugger for Java.在 EclipseCon 会议期间，我们还询问了他们如何使用Chronon Debugger for Java 做到这一点。 That one does not allow you to actually step back, but can play back a recorded program execution in such a way that it feels like reverse debugging.那个不允许你真正退后一步，但可以以一种感觉像反向调试的方式回放记录的程序执行。 (The main difference is that you cannot change the running program in the Chronon debugger, while you can do that in most other Java debuggers.) （主要区别在于您不能在 Chronon 调试器中更改正在运行的程序，而在大多数其他 Java 调试器中可以这样做。）

If I understood it correctly, it manipulates the byte code of the running program, such that every change of an internal state of the program is recorded.如果我理解正确的话，它会操纵正在运行的程序的字节码，从而记录程序内部状态的每一次变化。 External states don't need to be recorded additionally.外部状态不需要额外记录。 If they influence your program in some way, then you must have an internal variable matching that external state (and therefore that internal variable is enough).如果它们以某种方式影响您的程序，那么您必须有一个与该外部状态匹配的内部变量（因此该内部变量就足够了）。

During playback time they can then basically recreate every state of the running program from the recorded state changes.在播放期间，他们基本上可以根据记录的状态变化重新创建正在运行的程序的每个状态。

Interestingly the state changes are much smaller than one would expect on first look.有趣的是，状态变化比第一眼所预期的要小得多。 So if you have a conditional "if" statement, you would think that you need at least one bit to record whether the program took the then- or the else-statement.因此，如果您有条件“if”语句，您会认为您至少需要一位来记录程序是采用 then 语句还是 else 语句。 In many cases you can avoid even that, like in the case that those different branches contain a return value.在许多情况下，您甚至可以避免这种情况，例如那些不同的分支包含返回值的情况。 Then it is enough to record only the return value (which would be needed anyway) and to recalculate the decision about the executed branch from the return value itself.然后只记录返回值（无论如何都需要）并根据返回值本身重新计算关于执行分支的决定就足够了。

Answer 4

Although this question is old, most of the answers are too, and as reverse-debugging remains an interesting topic, I'm posting a 2015 answer.尽管这个问题很老，但大多数答案也是如此，并且由于反向调试仍然是一个有趣的话题，我发布了 2015 年的答案。 Chapters 1 and 2 of my MSc thesis, Combining reverse debugging and live programming towards visual thinking in computer programming , covers some of the historical approaches to reverse debugging (especially focused on the snapshot-(or checkpoint)-and-replay approach), and explains the difference between it and omniscient debugging:我的硕士论文的第 1 章和第 2 章，将逆向调试和实时编程结合到计算机编程中的可视化思维，涵盖了逆向调试的一些历史方法（特别关注快照（或检查点）和重放方法），以及解释了它和无所不知的调试之间的区别：

The computer, having forward-executed the program up to some point, should really be able to provide us with information about it.计算机已经向前执行了某个程序，它应该真的能够为我们提供有关它的信息。 Such an improvement is possible, and is found in what are called omniscient debuggers.这种改进是可能的，并且可以在所谓的无所不知的调试器中找到。 They are usually classified as reverse debuggers, although they might more accurately be described as "history logging" debuggers, as they merely record information during execution to view or query later, rather than allow the programmer to actually step backwards in time in an executing program.它们通常被归类为反向调试器，尽管它们可能更准确地被描述为“历史记录”调试器，因为它们只是在执行期间记录信息以供以后查看或查询，而不是允许程序员在执行程序中实际及时倒退. "Omniscient" comes from the fact that the entire state history of the program, having been recorded, is available to the debugger after execution. “无所不知”来自这样一个事实，即已记录的程序的整个状态历史记录在执行后可供调试器使用。 There is then no need to rerun the program, and no need for manual code instrumentation.这样就不需要重新运行程序，也不需要手动代码检测。

Software-based omniscient debugging started with the 1969 EXDAMS system where it was called "debug-time history-playback".基于软件的无所不知调试始于 1969 年的 EXDAMS 系统，在那里它被称为“调试时间历史回放”。 The GNU debugger, GDB, has supported omniscient debugging since 2009, with its 'process record and replay' feature. GNU 调试器 GDB 自 2009 年以来一直支持无所不知的调试，具有“进程记录和重放”功能。 TotalView, UndoDB and Chronon appear to be the best omniscient debuggers currently available, but are commercial systems. TotalView、UndoDB 和 Chronon 似乎是目前可用的最好的无所不知的调试器，但它们是商业系统。 TOD, for Java, appears to be the best open-source alternative, which makes use of partial deterministic replay, as well as partial trace capturing and a distributed database to enable the recording of the large volumes of information involved.对于 Java，TOD 似乎是最好的开源替代方案，它利用部分确定性重放、部分跟踪捕获和分布式数据库来记录所涉及的大量信息。

Debuggers that do not merely allow navigation of a recording, but are actually able to step backwards in execution time, also exist.不仅允许导航记录，而且实际上能够在执行时间中后退的调试器也存在。 They can more accurately be described as back-in-time, time-travel, bidirectional or reverse debuggers.它们可以更准确地描述为回溯、时间旅行、双向或反向调试器。

The first such system was the 1981 COPE prototype ...第一个这样的系统是 1981 年的 COPE 原型......

Answer 5

Nathan Fellman wrote:内森·费尔曼写道：

But does reverse debugging only allow you to roll back next and step commands that you typed, or does it allow you to undo any number of instructions?但是，反向调试是只允许您回滚您输入的 next 和 step 命令，还是允许您撤消任意数量的指令？

You can undo any number of instructions.您可以撤消任意数量的指令。 You're not restricted to, for instance, only stopping at the points where you stopped when you were going forward.例如，您不仅限于在前进时停止的点。 You can set a new breakpoint and run backwards to it.您可以设置一个新的断点并返回到它。

For instance, if I set a breakpoint on an instruction and let it run until then, can I then roll back to the previous instruction, even though I skipped over it?例如，如果我在一条指令上设置断点并让它一直运行到那时，我是否可以回滚到上一条指令，即使我跳过了它？

Yes.是的。 So long as you turned on recording mode before you ran to the breakpoint.只要你在运行到断点之前打开记录模式。

Answer 6

mozilla rr is a more robust alternative to GDB reverse debugging mozilla rr是 GDB 反向调试的更强大的替代方案

https://github.com/mozilla/rr https://github.com/mozilla/rr

GDB's built-in record and replay has severe limitations, eg no support for AVX instructions: gdb reverse debugging fails with "Process record does not support instruction 0xf0d at address" GDB 的内置记录和重放有严重的限制，例如不支持 AVX 指令： gdb 反向调试失败并显示“进程记录不支持地址处的指令 0xf0d”

Upsides of rr: rr 的优点：

much more reliable currently.目前更可靠。 I have tested it relatively long runs of several complex software.我已经测试了几个复杂软件的运行时间相对较长。
also offers a GDB interface with gdbserver protocol, making it a great replacement还提供了带有 gdbserver 协议的 GDB 接口，使其成为一个很好的替代品
small performance drop for most programs, I haven't noticed it myself without doing measurements大多数程序的性能下降很小，我自己没有进行测量就没有注意到
the generated traces are small on disk because only very few non-deterministic events are recorded, I've never had to worry about their size so far生成的跟踪在磁盘上很小，因为只记录了很少的非确定性事件，到目前为止我从来不必担心它们的大小

rr achieves this by first running the program in a way that records what happened on every single non-deterministic event such as a thread switch. rr 通过首先以一种记录每个非确定性事件（例如线程切换）上发生的事情的方式运行程序来实现这一点。

Then during the second replay run, it uses that trace file, which is surprisingly small, to reconstruct exactly what happened on the original non-deterministic run but in a deterministic way, either forwards or backwards.然后在第二次重播运行期间，它使用那个小得惊人的跟踪文件来准确地重建在原始非确定性运行中发生的事情，但以一种确定性的方式，向前或向后。

rr was originally developed by Mozilla to help them reproduce timing bugs that showed up on their nightly testing the following day. rr 最初由 Mozilla 开发，以帮助他们重现第二天夜间测试中出现的计时错误。 But the reverse debugging aspect is also fundamental for when you have a bug that only happens hours inside execution, since you often want to step back to examine what previous state led to the later failure.但是，当您有一个只在执行中几个小时内发生的错误时，反向调试方面也是很重要的，因为您经常想回过头来检查之前的状态是什么导致了后来的失败。

The following example showcases some of its features, notably the reverse-next , reverse-step and reverse-continue commands.下面的例子展示了它的一些特性，特别是reverse-next 、 reverse-step和reverse-continue命令。

Install on Ubuntu 18.04:在 Ubuntu 18.04 上安装：

sudo apt-get install rr linux-tools-common linux-tools-generic linux-cloud-tools-generic
sudo cpupower frequency-set -g performance
# Overcome "rr needs /proc/sys/kernel/perf_event_paranoid <= 1, but it is 3."
echo 'kernel.perf_event_paranoid=1' | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

Test program:测试程序：

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int f() {
    int i;
    i = 0;
    i = 1;
    i = 2;
    return i;
}

int main(void) {
    int i;

    i = 0;
    i = 1;
    i = 2;

    /* Local call. */
    f();

    printf("i = %d\n", i);

    /* Is randomness completely removed?
     * Recently fixed: https://github.com/mozilla/rr/issues/2088 */
    i = time(NULL);
    printf("time(NULL) = %d\n", i);

    return EXIT_SUCCESS;
}

compile and run:编译并运行：

gcc -O0 -ggdb3 -o reverse.out -std=c89 -Wextra reverse.c
rr record ./reverse.out
rr replay

Now you are left inside a GDB session, and you can properly reverse debug:现在您被留在 GDB 会话中，您可以正确地反向调试：

(rr) break main
Breakpoint 1 at 0x55da250e96b0: file a.c, line 16.
(rr) continue
Continuing.

Breakpoint 1, main () at a.c:16
16          i = 0;
(rr) next
17          i = 1;
(rr) print i
$1 = 0
(rr) next
18          i = 2;
(rr) print i
$2 = 1
(rr) reverse-next
17          i = 1;
(rr) print i
$3 = 0
(rr) next
18          i = 2;
(rr) print i
$4 = 1
(rr) next
21          f();
(rr) step
f () at a.c:7
7           i = 0;
(rr) reverse-step
main () at a.c:21
21          f();
(rr) next
23          printf("i = %d\n", i);
(rr) next
i = 2
27          i = time(NULL);
(rr) reverse-next
23          printf("i = %d\n", i);
(rr) next
i = 2
27          i = time(NULL);
(rr) next
28          printf("time(NULL) = %d\n", i);
(rr) print i
$5 = 1509245372
(rr) reverse-next
27          i = time(NULL);
(rr) next
28          printf("time(NULL) = %d\n", i);
(rr) print i
$6 = 1509245372
(rr) reverse-continue
Continuing.

Breakpoint 1, main () at a.c:16
16          i = 0;

When debugging complex software, you will likely run up to a crash point, and then fall inside a deep frame.在调试复杂的软件时，您可能会遇到崩溃点，然后陷入深度框架。 In that case, don't forget that to reverse-next on higher frames, you must first:在这种情况下，不要忘记在更高的帧上reverse-next ，您必须首先：

reverse-finish

up to that frame, just doing the usual up is not enough.达到那个框架，仅仅做通常的up是不够的。

The most serious limitations of rr in my opinion are:我认为 rr 最严重的局限性是：

https://github.com/mozilla/rr/issues/2089 you have to do a second replay from scratch, which can be costly if the crash you are trying to debug happens, say, hours into execution https://github.com/mozilla/rr/issues/2089您必须从头开始进行第二次重播，如果您尝试调试的崩溃发生（例如，执行后数小时），这可能会导致成本高昂
https://github.com/mozilla/rr/issues/1373 x86 only https://github.com/mozilla/rr/issues/1373仅限 x86

UndoDB is a commercial alternative to rr: https://undo.io Both are trace / replay based, but I'm not sure how they compare in terms of features and performance. UndoDB 是 rr 的商业替代品： https ://undo.io 两者都是基于跟踪/重放的，但我不确定它们在功能和性能方面的比较情况。

Answer 7

Here is how another reverse-debugger called ODB works.下面是另一个名为 ODB 的反向调试器的工作原理。 Extract:提炼：

Omniscient Debugging is the idea of collecting "time stamps" at each "point of interest" (setting a value, making a method call, throwing/catching an exception) in a program and then allowing the programmer to use those time stamps to explore the history of that program run. Omniscient Debugging 是在程序中的每个“兴趣点”（设置值、进行方法调用、抛出/捕获异常）收集“时间戳”，然后允许程序员使用这些时间戳来探索该程序运行的历史记录。

The ODB ... inserts code into the program's classes as they are loaded and when the program runs, the events are recorded. ODB ... 在加载程序类时将代码插入到程序类中，并且在程序运行时记录事件。

I'm guessing the gdb one works in the same kind of way.我猜 gdb 以同样的方式工作。

Answer 8

Reverse debugging means you can run the program backwards, which is very useful to track down the cause of a problem.反向调试意味着您可以向后运行程序，这对于追踪问题的原因非常有用。

You don't need to store the complete machine state for each step, only the changes.您不需要为每个步骤存储完整的机器状态，只需存储更改。 It is probably still quite expensive.它可能仍然相当昂贵。

反向调试是如何工作的？

问题描述

8 个解决方案

解决方案1
133 已采纳 2009-10-08 03:37:04

解决方案2
12 2009-10-20 19:20:41

解决方案3
9 2012-04-02 14:15:08

解决方案4
8 2015-06-08 12:19:47

解决方案5
4 2009-10-08 22:47:48

解决方案6
4 2018-10-30 11:24:19

解决方案7
2 2009-09-24 09:01:47

解决方案8
2 2009-09-24 10:24:53

反向调试是如何工作的？

问题描述

8 个解决方案

解决方案1 133 已采纳 2009-10-08 03:37:04

解决方案2 12 2009-10-20 19:20:41

解决方案3 9 2012-04-02 14:15:08

解决方案4 8 2015-06-08 12:19:47

解决方案5 4 2009-10-08 22:47:48

解决方案6 4 2018-10-30 11:24:19

解决方案7 2 2009-09-24 09:01:47

解决方案8 2 2009-09-24 10:24:53

解决方案1
133 已采纳 2009-10-08 03:37:04

解决方案2
12 2009-10-20 19:20:41

解决方案3
9 2012-04-02 14:15:08

解决方案4
8 2015-06-08 12:19:47

解决方案5
4 2009-10-08 22:47:48

解决方案6
4 2018-10-30 11:24:19

解决方案7
2 2009-09-24 09:01:47

解决方案8
2 2009-09-24 10:24:53