How to log or replay lines or instructions executed immediately before a crash

Question

Often I have to debug crashing C++ programs on Windows where I can reproduce the crash, but it is hard to determine what sequence of instructions in the code caused the crash (eg another thread overwriting memory of the crashing thread). Even a call stack does not help in that case. Usually I resort to narrowing down the crash cause by commenting out sections of the source code, but this is very tedious.

Does anyone know a tool for Windows that can report or replay the last few source code lines or machine code instructions executed in all threads immediately before a crash? Ie something like the reverse debugging capability of gdb or something like Mutek's BugTrapper (which no longer is available). I am looking for a released and stable tool (I am aware of SoftwareVerify's 'Bug Validator' and Hexray's IDA Pro 6.3 Trace Replayer, both of which still are in closed beta programs).

What I already tried were the WinDbg trace commands wt and ta @$ra , but both commands have the disadvantage that they stop automatically after a few seconds. I require trace commands that run until the crash happens, and that trace all threads of the running program.

NOTE: I am not looking for a debug tool designed to fix a particular problem, like gflags, pageheap, Memory Validator, Purify, etc. I am looking for released and stable tool to trace or replay at the instruction level.

Answer 1

In case you face with another thread overwriting memory of the crashing thread it is useful to use gflags ( GFlags and PageHeap ). Instead of telling you a few lines that has been executed before a crash it will tell you exactly the place where your algorithm has overwriten a correctly allocated block of memory.

You first activate this type of check:

gflags /p /enable your_app.exe /full or
gflags /p /enable your_app.exe /full /backwards

Check that you have activated correctly
gflags /p

run you application and collect dump files

and then disable checking with gflags:

gflags /p /disable your_app.exe

Update 1

It does not immediately detect problems like *p = 0; where p is an invalid pointer
At least some problems are detected.
For example:

 #include <stdio.h> int main(int argc, char *argv[]) { int *p = new int; printf("1) p=%p\\n",p); *p = 1; delete p; printf("2) p=%p\\n",p); *p = 2; printf("Done\\n"); return 0; }

When I run with gflags enabled I get a dump file and the problem is correctly identified:

 STACK_TEXT: 0018ff44 00401215 00000001 03e5dfb8 03dfdf48 mem_alloc_3!main+0x5b [c:\\src\\tests\\test.cpp\\mem_alloc\\mem_alloc\\mem_alloc.3.cpp @ 11] 0018ff88 75f8339a 7efde000 0018ffd4 77bb9ef2 mem_alloc_3!__tmainCRTStartup+0x10f [f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\crtexe.c @ 586] 0018ff94 77bb9ef2 7efde000 2558d82c 00000000 kernel32!BaseThreadInitThunk+0xe 0018ffd4 77bb9ec5 004013bc 7efde000 00000000 ntdll!__RtlUserThreadStart+0x70 0018ffec 00000000 004013bc 7efde000 00000000 ntdll!_RtlUserThreadStart+0x1b STACK_COMMAND: ~0s; .ecxr ; kb FAULTING_SOURCE_CODE: 7: printf("1) p=%p\\n",p); 8: *p = 1; 9: delete p; 10: printf("2) p=%p\\n",p); > 11: *p = 2; 12: printf("Done\\n"); 13: return 0; 14: 15: }

Update 2

Another example from @fmunkert:

STACK_TEXT:  
0018ff44 00401205 00000001 0505ffbe 04ffdf44 mem_alloc_3!main+0x52 [c:\src\tests\test.cpp\mem_alloc\mem_alloc\mem_alloc.3.cpp @ 12]
0018ff88 75f8339a 7efde000 0018ffd4 77bb9ef2 mem_alloc_3!__tmainCRTStartup+0x10f [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 586]
0018ff94 77bb9ef2 7efde000 2577c47c 00000000 kernel32!BaseThreadInitThunk+0xe
0018ffd4 77bb9ec5 004013ac 7efde000 00000000 ntdll!__RtlUserThreadStart+0x70
0018ffec 00000000 004013ac 7efde000 00000000 ntdll!_RtlUserThreadStart+0x1b


STACK_COMMAND:  ~0s; .ecxr ; kb

FAULTING_SOURCE_CODE:  
     8:         printf("1) p=%p\n",p);  
     9:         *p = 1;  
    10:         p++;
    11:         printf("2) p=%p\n",p);
>   12:         *p =  2;   // <==== Illegal memory access
    13:         printf("Done\n");  
    14:         return 0;
    15: 
    16: }

gflags /p /enable mem_alloc.3.exe /full /unaligned

 STACK_TEXT: 0018ff44 00401205 00000001 0505ffbe 04ffdf44 mem_alloc_3!main+0x52 [c:\\src\\tests\\test.cpp\\mem_alloc\\mem_alloc\\mem_alloc.3.cpp @ 12] 0018ff88 75f8339a 7efde000 0018ffd4 77bb9ef2 mem_alloc_3!__tmainCRTStartup+0x10f [f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\crtexe.c @ 586] 0018ff94 77bb9ef2 7efde000 2577c47c 00000000 kernel32!BaseThreadInitThunk+0xe 0018ffd4 77bb9ec5 004013ac 7efde000 00000000 ntdll!__RtlUserThreadStart+0x70 0018ffec 00000000 004013ac 7efde000 00000000 ntdll!_RtlUserThreadStart+0x1b STACK_COMMAND: ~0s; .ecxr ; kb FAULTING_SOURCE_CODE: 8: printf("1) p=%p\\n",p); 9: *p = 1; 10: p++; 11: printf("2) p=%p\\n",p); > 12: *p = 2; // <==== Illegal memory access 13: printf("Done\\n"); 14: return 0; 15: 16: }

Unfortunatelly The /unaligned option might result in the fact that a program will not work properly ( How to use Pageheap.exe ):

Some programs make assumptions about 8-byte alignment and they stop working correctly with the /unaligned parameter. Microsoft Internet Explorer is one such program.

Answer 2

I found a solution: "replay debugging" using VMware Workstation and Visual Studio 2010. Setting it up takes a lot of time, but you are rewarded with a Visual Studio C++ debugger that can debug backwards in time. Here is a video that demonstrates how replay debugging works: http://blogs.vmware.com/workstation/2010/01/replay-debugging-try-it-today.html .

A drawback of the solution is that VMware seemingly has discontinued replay debugging in the latest VMware versions. Furthermore, only certain processor types seem to support replaying. I have not found any comprehensive list of supported processors; I tested the replay features on three of my PCs: replaying did not work on a Core i7 200; replaying worked on a Core2 6700 and on a Core2 Q9650.

I really hope that VMware reconsiders and introduces replay debugging again in future VMware Workstation versions, because this really adds a new dimension to debugging.

For those of you who are interested, here is a description how you can set up an environment for replay debugging:

In the description below, "local debugging" means that Visual Studio and VMware are installed on the same PC. "Remote debugging" means that Visual Studio and VMware are installed on different PCs.

Install Visual Studio 2010 with SP1 on the host system.
Make sure Visual Studio has been configured to use Microsoft's symbol servers. (Under "Tools | Options | Debugging | Symbols").
On the host system, install "Debugging Tools for Windows".
Install VMware Workstation 7.1. (Version 8.0 no longer contains the replay debugging feature). This will also install a plug-in into Visual Studio.
Install a virtual machine (VM) on VMware with Windows XP SP3.
If the application under test is a debug build, install the Visual Studio debug DLLs on the VM. (See http://msdn.microsoft.com/en-us/library/dd293568.aspx for instructions how to do that, but use a "Debug" configuration instead of "Release").
Copy "gflags.exe" from the host's "Debugging Tools for Windows" directory to the VM, run gflags.exe on the VM, select "Disable paging of kernel stacks" under the "System Registry tab" and press OK. Reboot the VM.
Copy all EXE and DLL files of the application under test to the VM and make sure that you can start the application and reproduce the problem.
Shutdown the VM and create a snapshot (via context menu item "Take Snapshot" in VMware Workstation).
(Only for remote debugging:) Start the following command on the Visual Studio PC and enter an arbitrary passcode:
C:\\Program Files\\VMware\\VMware Workstation\\Visual Studio Integrated Debugger\\dclProxy.exe hostname
Replace hostname by the name of the PC.
(Only for remote debugging:) Create a recording manually for the VM. Ie log in to the VM's operating system, start the recording (via context menu "Record"), run the application under test and perform the actions necessary to reproduce the problem. Then stop and save the recording.
Start Visual Studio and go to "VMware | Options | Replay Debugging in VM | General", and set the following values:
- "Local or Remote" must be set to "Local" for local debugging or to "Remote" for remote debugging.
- "Virtual Machine" must be set to the path to the VM's .vmx file .
- "Remote Machione Passcode" must be set to be passcode you used above (only for remote debugging).
- "Recording to Replay" must be set to a recording name that you previously created with VMware.
- "Host Executable Search Path" must be set to a directory in which you save DLLs which are required by the application under test and which are needed by Visual Studio to display correct stack traces.
Press "Apply".
Go to "VMware | Options | Replay Debugging in VM | Pre-Record Event", and set the following values:
- "Base Snapshot for Recording": name of snapshot created previously.
Press "OK".
(For local debugging:) In Visual Studio, select "VMware | Create Recording for Replay"; this restarts the VM. Login to the VM, run the application under test and perform the actions necessary to reproduce the problem. Then stop and save the recording.
Select "VMware | Start Replay Debugging". VMware now automatically restarts the VM and the application under test and replays the recorded actions. Wait until the application crashes; the Visual Studio debugger then automatically becomes active.
In the Visual Studio debugger, set a breakpoint to a location where you think the application has been before the crash. Then, select "VMware | Reverse Continue". The debugger now runs backwards to the breakpoint. This operation can take some time because the VM will be restarted and replayed until your breakpoint is reached. (You can speed up this operation by adding a snapshot a few seconds before the crash happens when you record the scenario. You can add additional snapshots during replay debugging.)
Once VMware has replayed the VM to your breakpoint, you can use "Step Over" and "Step Into" to step forward from your breakpoint, ie you replay the recorded history of events, until you reach a point where you can identify the reason why your application crashed.

Further information: http://www.replaydebugging.com/

Answer 3

I would attach WinDbg when the program is running and do a minidump when it debugbreaks on a crash or exception:

.dump /ma c:\mem.dmp // c:\mem.dmp could be any other location you desire

I would enable gflags for your app, either from the command line of within WinDbg:

!gflag +ust

remember to remove this flag after!!

Then you could run an automated exepction analysis:

!analyze -v

this may tell you what it thinks caused the crash, you can dump the call stacks of all threads:

~* kb

and if see anything suspicuous you can switch thread and inspect further:

~x s

You can inspect the exception context record:

.ecxr

there is a good link on how to recover the call stack from a catch block:http://blogs.msdn.com/b/slavao/archive/2005/01/30/363428.aspx and also this: http://blogs.msdn.com/b/jmstall/archive/2005/01/18/355697.aspx

the main thing here is that with windbg attached you should be able to inspect the state of all the threads and the call stacks, you can also open the minidump in visual studio: http://msdn.microsoft.com/en-us/library/windows/desktop/ee416349%28v=vs.85%29.aspx#Analysis_of_a_minidump if you prefer visual studio for navigating, you can open the same dump in windbg to use its tools for analysis and visual studio for navigating the code. Hope this helps.

Answer 4

How about using BMC's AppSight?

We used it at a previous company (sorry, it took me a while to remember the name), it was used to research crashes etc. ISTR you ran it and then ran your software and it recorded everything that was happening in a log file which you can view later.

It definitely works on Windows as that is what I used it on.

It might well be what you are looking for?

Answer 5

Doesn't gdb offer this functionality out of the box?

It has been a while since I used it but I recall it could run a program until it crashed and then replay the steps for you in the debugger.

Also, it would be straightforward to setup your own logging application which could output any amount of data you chose and could be activated by a command line param to the exe?

You could set it up now to tackle a crash you are having or just to cover the basics and then extend it as you fixed bugs or added new functionality. The advantage would be that you would be able to capture exactly the data that you find useful and could even specify levels of logging to avoid being swamped with noise?

Answer 6

Not entirely sure if this is what you want, but 'u' will disassemble the last instructions from the current IP register on the current thread. This will show you the last instructions that were run, and you can normally figure out what values were for different registers by backing your way through the code it disassembles. It's a slow and tough process most of the time, but it gives you with almost 100% accuracy (barring some strange hardware problems, or really odd code problems) what just happened. I've used this method in the past to figure out why certain things were nulled out when I didn't have the source code.

If you check the windbg help file you'll find more information on it.

How to log or replay lines or instructions executed immediately before a crash

Question

6 answers

solution1
9 2012-06-06 11:05:39

solution2
5 ACCPTED

solution3
2 2012-06-06 12:15:27

solution4
2 2012-06-14 11:10:35

solution5
1 2012-06-12 12:57:12

solution6
-1 2012-06-15 04:06:26

How to log or replay lines or instructions executed immediately before a crash

Question

6 answers

solution1 9 2012-06-06 11:05:39

solution2 5 ACCPTED

solution3 2 2012-06-06 12:15:27

solution4 2 2012-06-14 11:10:35

solution5 1 2012-06-12 12:57:12

solution6 -1 2012-06-15 04:06:26

solution1
9 2012-06-06 11:05:39

solution2
5 ACCPTED

solution3
2 2012-06-06 12:15:27

solution4
2 2012-06-14 11:10:35

solution5
1 2012-06-12 12:57:12

solution6
-1 2012-06-15 04:06:26