简体   繁体   English

符号如何影响调用堆栈的行走?

[英]How do symbols affect call stack walking?

I'm trying to analyze a crash dump with windbg, and I'm getting inconstant crash dumps depending on what symbols are loaded. 我正在尝试使用windbg分析崩溃转储,并且根据加载的符号,我将获得不定的故障转储。 My simple understanding is that the symbols only help point to what the stack is referring to, but the stack itself is unchanged. 我的简单理解是符号只能帮助指向堆栈所指的内容,但堆栈本身是不变的。 That's obviously wrong, but now I don't know what the heck I'm looking at. 这显然是错的,但现在我不知道我在看什么。

Heres a call stack with all symbols loaded: 下面是一个加载了所有符号的调用堆栈:

0:000> kn
 # ChildEBP RetAddr  
00 0012e120 7d61f60f ntdll!ZwGetContextThread+0x12
01 0012e130 000f0005 ntdll!RtlFreeHeap+0x711
WARNING: Frame IP not in any known module. Following frames may be wrong.
02 0012e1d0 6d5b5b20 0xf0005
03 0012e314 6d5b407f dbghelp!Win32LiveSystemProvider::OpenMapping+0x228
04 0012e464 0012e488 dbghelp!GenAllocateModuleObject+0x1ad
05 0012e4e4 6d5b588e 0x12e488
06 0012e69c 7d4d132f dbghelp!Win32LiveSystemProvider::GetOsCsdString+0x4d
07 0012e6b8 6d5b5fd2 kernel32!ReadProcessMemory+0x1b
08 0012e6e0 6d5b604e dbghelp!Win32LiveSystemProvider::ReadVirtual+0x3d
09 0012e700 6d5b2f3d dbghelp!Win32LiveSystemProvider::ReadAllVirtual+0x1d
0a 0012e728 6d5b304f dbghelp!WriteMemoryFromProcess+0x35
0b 0012e7ac 6d5b345b dbghelp!WriteThreadList+0xc1
0c 0012e7cc 6d5b367b dbghelp!WriteDumpData+0x83
0d 0012e90c 6d5b3778 dbghelp!MiniDumpProvideDump+0x174
*** WARNING: Unable to verify checksum for ERRHNDLR.dll
0e 0012e96c 0091235d dbghelp!MiniDumpWriteDump+0xc8
*** WARNING: Unable to verify timestamp for msvcr90.dll
0f 0012e9fc 7857dcaa ERRHNDLR!ExceptionTranslator+0x25d [c:\redacted\errorhandler.cpp @ 230]
10 0012ea48 7857d4f5 msvcr90!_CallSETranslator+0xa5
11 0012ea7c 7857d8c0 msvcr90!__CxxExceptionFilter+0x217
12 0012eadc 7857d9dd msvcr90!__CxxExceptionFilter+0x5e2
13 0012eb10 7857db94 msvcr90!__InternalCxxFrameHandler+0xdb
*** WARNING: Unable to verify checksum for PROGRAM.exe
14 0012eb84 004f1c9e msvcr90!__CxxFrameHandler3+0x26
15 0012eba8 004f1c9e PROGRAM!__sse2_available_init+0x1269c
16 0012ec0c 00130000 PROGRAM!__sse2_available_init+0x1269c
17 00000000 00000000 0x130000

I can tell that something bad happened, but it appears to have happened as soon as the app started, which isn't the case. 我可以说发生了一些不好的事情,但它似乎是在应用程序启动时发生的,但实际情况并非如此。

Heres the same call stack but without the symbols for msvcr90 loaded 下面是相同的调用堆栈,但没有加载msvcr90的符号

0:000> kn
 # ChildEBP RetAddr  
00 0012e120 7d61f60f ntdll!ZwGetContextThread+0x12
01 0012e130 000f0005 ntdll!RtlFreeHeap+0x711
WARNING: Frame IP not in any known module. Following frames may be wrong.
02 0012e1d0 6d5b5b20 0xf0005
03 0012e314 6d5b407f dbghelp!Win32LiveSystemProvider::OpenMapping+0x228
04 0012e464 0012e488 dbghelp!GenAllocateModuleObject+0x1ad
05 0012e4e4 6d5b588e 0x12e488
06 0012e69c 7d4d132f dbghelp!Win32LiveSystemProvider::GetOsCsdString+0x4d
07 0012e6b8 6d5b5fd2 kernel32!ReadProcessMemory+0x1b
08 0012e6e0 6d5b604e dbghelp!Win32LiveSystemProvider::ReadVirtual+0x3d
09 0012e700 6d5b2f3d dbghelp!Win32LiveSystemProvider::ReadAllVirtual+0x1d
0a 0012e728 6d5b304f dbghelp!WriteMemoryFromProcess+0x35
0b 0012e7ac 6d5b345b dbghelp!WriteThreadList+0xc1
0c 0012e7cc 6d5b367b dbghelp!WriteDumpData+0x83
0d 0012e90c 6d5b3778 dbghelp!MiniDumpProvideDump+0x174
*** WARNING: Unable to verify checksum for ERRHNDLR.dll
0e 0012e96c 0091235d dbghelp!MiniDumpWriteDump+0xc8
*** WARNING: Unable to verify timestamp for msvcr90.dll
*** ERROR: Module load completed but symbols could not be loaded for msvcr90.dll
0f 0012e9fc 7857dcaa ERRHNDLR!ExceptionTranslator+0x25d [c:redacted\errorhandler.cpp @ 230]
10 0012ea48 7857d4f5 msvcr90+0x5dcaa
11 0012ea7c 7857d8c0 msvcr90+0x5d4f5
12 0012eadc 7857d9dd msvcr90+0x5d8c0
13 0012eb10 7857db94 msvcr90+0x5d9dd
14 0012eb4c 7d61ec4a msvcr90+0x5db94
15 0012eb70 7d61ec1b ntdll!ExecuteHandler2+0x26
16 0012ec18 7d61ea56 ntdll!ExecuteHandler+0x24
17 0012ec18 026fe31a ntdll!KiUserExceptionDispatcher+0xe
*** WARNING: Unable to verify checksum for Storage.dll
18 0012ef4c 026fddd0 Storage!CList<Property *,Property *>::AddTail+0xa [c:\program files (x86)\microsoft visual studio 9.0\vc\atlmfc\include\afxtempl.h @ 1003]
*** WARNING: Unable to verify checksum for Storage2.dll
19 0012ef54 0274f5ec Storage!PropertyList::Add+0x10 [c:\redacted\propertylist.cpp @ 236]
1a 0012ef5c 0012f280 Storage2!Thing::Process+0x12c [c:\redacted\thing.cpp @ 345]
1b 0012ef60 0fe8be80 0x12f280
*** WARNING: Unable to verify checksum for PROGRAM.exe
1c 0012f368 0043d9a1 0xfe8be80
1d 0012f3b0 004f1c9e PROGRAM!View::SelectObject+0x151 [c:\redacted\view.cpp @ 2724]
1e 0012f3d4 004ea73b PROGRAM!__sse2_available_init+0x1269c
*** WARNING: Unable to verify checksum for DLL1.dll
1f 0012f450 02847893 PROGRAM!__sse2_available_init+0xb139
*** WARNING: Unable to verify checksum for DLL2.dll
20 0012f4ac 02c06398 DLL1!_RawDllMainProxy+0x1ed5
21 0012f534 02c06b86 DLL2!__sse2_available_init+0x40eb
22 0012f5a8 02c03fdd DLL2!__sse2_available_init+0x48d9
23 0012f5e0 02c052f4 DLL2!__sse2_available_init+0x1d30
24 0012f664 0283c231 DLL2!__sse2_available_init+0x3047
25 0012f6b4 028475aa DLL1!Logic::Send+0x121 [c:\redacted\logic.cpp @ 438]
26 0012f750 7d94757c DLL1!_RawDllMainProxy+0x1bec
27 0012f7a4 00000000 user32!UserCallWinProcCheckWow+0x128

Hey, that may actually be useful! 嘿,这实际上可能有用! It's also closer to what is displayed in Visual Studio when I use it to debug the crash dump. 当我使用它调试崩溃转储时,它也更接近Visual Studio中显示的内容。 But VS's call stack is completely different below "Storage2!Thing::Process", suggesting that unrelated functions are in the call stack somehow, which is why I'm trying windbg. 但VS的调用堆栈在“Storage2!Thing :: Process”之下是完全不同的,这表明不相关的函数以某种方式存在于调用堆栈中,这就是为什么我在尝试windbg。

So, what am I missing? 那么,我错过了什么? Why should unloading symbols reveal a potentially more useful call stack? 为什么卸载符号会显示一个可能更有用的调用堆栈?

It's a long answer, but in short: On the x86 PDBs contain FPO information, which allows the debugger to reliably unwind a call stack. 这是一个很长的答案,但简而言之:在x86上,PDB包含FPO信息,它允许调试器可靠地展开调用堆栈。 This is required in the case of FPO frames, where EBP is not used as a frame pointer. 这在FPO帧的情况下是必需的,其中EBP不用作帧指针。 In the absence of PDBs, the debugger assumes that every frame is an EBP frame and will simply walk the EBP chain until it reaches the end (ie an unreadable EBP value). 在没有PDB的情况下,调试器假定每个帧都是EBP帧,并且只是走EBP链直到它到达终点(即不可读的EBP值)。

For more details on FPO and EBP frames, there's a good article here: 有关FPO和EBP帧的更多详细信息,这里有一篇很好的文章:

http://www.nynaeve.net/?p=91 http://www.nynaeve.net/?p=91

Now, to get to your issue. 现在,来解决你的问题。 The first call stack that you showed is absolutely correct. 您显示的第一个调用堆栈是绝对正确的。 Some module threw an exception, so the O/S began unwinding call frames looking for an exception handler. 某些模块抛出异常,因此O / S开始展开调用框架以寻找异常处理程序。 Unfortunately, no one handled the error so the default exception handler ran, which proceeded to crash the application. 不幸的是,没有人处理错误,因此运行了默认的异常处理程序,这导致应用程序崩溃。 Because the call stack of the offending code was unwound, you don't see anything but the O/S supplied components on the stack. 由于违规代码的调用堆栈已展开,因此除了堆栈上的O / S提供的组件外,您不会看到任何内容。

In the second case, you have no symbols and so the O/S treats every call frame as if it's EBP. 在第二种情况下,您没有符号,因此O / S将每个调用帧视为EBP。 In this case, you got "lucky" and picked up a garbage EBP that started to unwind an old call stack. 在这种情况下,你得到了“幸运”,并拿起一个垃圾EBP,开始解除旧的调用堆栈。 While it pointed off to the right thing in this case, this is the sort of red herring that can cause you to start your analysis with invalid data and waste a LOT of time (been there, done that!). 虽然在这种情况下指出了正确的事情,但这种红鲱鱼可能会导致您使用无效数据开始分析并浪费大量时间(去过那里,完成了!)。

The .excr command is always the correct thing to do in the case of an exception. 在异常情况下,.excr命令始终是正确的操作。 This works because the O/S stores the register state of the processor at the time of the exception before unwinding call frames looking for an exception handler. 这是因为O / S在异常时存储处理器的寄存器状态,然后展开寻找异常处理程序的调用帧。 The .excr command uses that state to bring you back in time to the moment where the bad state was detected, instead of after the fact while the O/S was trying to handle it. .excr命令使用该状态将您带回到检测到错误状态的时刻,而不是在O / S尝试处理它之后。

-scott 斯科特

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM