简体   繁体   English

什么时候除以零而不是除以零? 调试器中的一个难题(静态变量问题)

[英]When is a divide by zero not a divide by zero? A puzzle in the debugger (static variable issues)

I'm very confused and I think my debugger is lying to me. 我很困惑,我认为我的调试器对我说谎。 I have the following loop in my code: 我的代码中有以下循环:

MyClass::UploadFile(CString strFile)
{
  ...
  static DWORD dwLockWaitTime = EngKey::GetDWORD(DNENG_SERVER_UPLOAD_LOCK_WAIT_TIME, DNENG_SERVER_UPLOAD_LOCK_WAIT_TIME_DEFAULT);
  static DWORD dwLockPollInterval = EngKey::GetDWORD(DNENG_SERVER_UPLOAD_LOCK_POLL_INTERVAL, DNENG_SERVER_UPLOAD_LOCK_POLL_INTERVAL_DEFAULT);

  LONGLONG llReturnedOffset(0LL);
  BOOL bLocked(FALSE);
  for (DWORD sanity = 0; (sanity == 0 || status == RESUMABLE_FILE_LOCKED) && sanity < (dwLockWaitTime / dwLockPollInterval); sanity++) 
    {
      ...

This loop has been executed hundreds of times during the course of my program and the two static variables are not changed anywhere in the code, they're written to just once when they're statically initialized and read from in the loop conditions and in one other place. 在我的程序过程中,这个循环已被执行了数百次,并且两个静态变量在代码中的任何地方都没有改变,当它们被静态初始化并在循环条件下读取时,它们只被写入一次其他地方。 Since they're user settings which are read from the Windows registry they almost always have the constant values of dwLockWaitTime = 60 and dwLockPollInterval = 5. So the loop is always doing 60 / 5. 由于它们是从Windows注册表中读取的用户设置,因此它们几乎总是具有dwLockWaitTime = 60和dwLockPollInterval = 5的常量值。因此循环始终为60/5。

Very rarely, I get a crash dump which shows that this line of code has thrown a division by zero error. 很少,我得到一个崩溃转储,显示这行代码已经抛出除零错误。 I've checked what WinDbg says and it shows: 我已经检查了WinDbg所说的内容,它显示:

FAULTING_IP: 
procname!CServerAgent::ResumableUpload+54a [serveragent.cpp @ 725]
00000001`3f72d74a f73570151c00    div     eax,dword ptr [proc!dwLockPollInterval (00000001`3f8eecc0)]

EXCEPTION_RECORD:  ffffffffffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 000000013f72d74a (proc!CServerAgent::ResumableUpload+0x000000000000054a)
   ExceptionCode: c0000094 (Integer divide-by-zero)
  ExceptionFlags: 00000000
NumberParameters: 0

ERROR_CODE: (NTSTATUS) 0xc0000094 - {EXCEPTION}  Integer division by zero.

I've checked the assembler code and it shows that the crash occurred on this div instruction. 我检查了汇编代码,它显示崩溃发生在这个div指令上。

00000001`3f72d744 8b0572151c00    mov     eax,dword ptr [dwLockWaitTime (00000001`3f8eecbc)]
00000001`3f72d74a f73570151c00    div     eax,dword ptr [dwLockPollInterval (00000001`3f8eecc0)]

So as you can see the value at 000000013f8eecbc was moved into eax and then eax was divided by the value at 000000013f8eecc0 . 因此,您可以看到000000013f8eecbc处的值已移至eax ,然后eax除以000000013f8eecc0处的值。

What is at those two values you ask? 你问的那两个价值是什么?

0:048> dd 00000001`3f8eecbc
00000001`3f8eecbc  0000003c 00000005 00000001 00000000
00000001`3f8eeccc  00000000 00000002 00000000 00000000
00000001`3f8eecdc  00000000 7fffffff a9ad25cf 7fffffff
00000001`3f8eecec  a9ad25cf 00000000 00000000 00000000
00000001`3f8eecfc  00000000 00000000 00000000 00000000
00000001`3f8eed0c  00000000 00000000 00000000 00000000
00000001`3f8eed1c  00000000 00000000 00000000 00000000
00000001`3f8eed2c  00000000 00000000 00000000 00000000
0:048> dd 000000013f8eecc0
00000001`3f8eecc0  00000005 00000001 00000000 00000000
00000001`3f8eecd0  00000002 00000000 00000000 00000000
00000001`3f8eece0  7fffffff a9ad25cf 7fffffff a9ad25cf
00000001`3f8eecf0  00000000 00000000 00000000 00000000
00000001`3f8eed00  00000000 00000000 00000000 00000000
00000001`3f8eed10  00000000 00000000 00000000 00000000
00000001`3f8eed20  00000000 00000000 00000000 00000000
00000001`3f8eed30  00000000 00000000 00000000 00000000

The constants 60 and 5 exactly as I'd expect. 常数605完全符合我的预期。 So where's the divide by zero??? 那么除以零在哪里??? Is my debugger lying? 我的调试器在说谎吗? Surely the divide by zero has been thrown by the hardware so it can't have made a mistake about that? 当然,硬件会抛出除以零,所以它不会犯错误吗? And if it was a divide by zero in a different place in my code what are the odds that the debugger would show the instruction pointer in exactly this place? 如果它在我的代码中的不同位置被零除,那么调试器在这个位置显示指令指针的几率是多少? I confess, I'm stumped.. 我承认,我很难过..

Since the code is part of a member function, and you're calling this function from multiple threads, the static variables are not thread-safe if using a compiler that does not conform to C++ 11 standards. 由于代码是成员函数的一部分,并且您从多个线程调用此函数,因此如果使用不符合C ++ 11标准的编译器,则static变量不是线程安全的。 Thus you may get data races when initializing those two static variables. 因此,在初始化这两个静态变量时,您可能会获得数据竞争。

For a C++ 11 standard conforming compiler, static variables are now guaranteed to be initialized by the first thread, while subsequent threads wait until the static is initialized. 对于符合C ++ 11标准的编译器,现在保证静态变量由第一个线程初始化,而后续线程等待静态初始化。

For Visual Studio 2010 and below, static local variables are not guaranteed to be thread safe, since these compilers conform to the C++ 03 and C++ 98 standard. 对于Visual Studio 2010及更低版本,静态局部变量不保证是线程安全的,因为这些编译器符合C ++ 03和C ++ 98标准。

For Visual Studio 2013 , I am not sure of the level of C++ 11 support in terms of static local initialization. 对于Visual Studio 2013 ,我不确定静态本地初始化方面的C ++ 11支持级别。 Therefore, for Visual Studio 2013, you may have to use proper synchronization to ensure that static local variables are initialized correctly. 因此,对于Visual Studio 2013,您可能必须使用正确的同步以确保正确初始化静态局部变量。

For Visual Studio 2015 , this item has been addressed and proper static local initialization is fully implemented, so the code you currently have should work correctly for VS 2015 and above. 对于Visual Studio 2015 ,此项已得到解决,并且完全实现了正确的静态本地初始化,因此您当前拥有的代码应该适用于VS 2015及更高版本。


Edit: For Visual Studio 2013 , static local thread-safe initialization is not implemented ("Magic Statics"), as described here . 编辑:对于Visual Studio 2013 ,未实现静态本地线程安全初始化(“Magic Statics”), 如此处所述

Therefore, we can cautiously verify that the reason for the original problem is the static-local initialization issue and threading. 因此,我们可以谨慎地验证原始问题的原因是静态本地初始化问题和线程。 So the solution (if you want to stick with VS 2013) is to use proper synchronization, or redesign your application so that static variables are no longer needed. 因此,解决方案(如果您想坚持使用VS 2013)是使用正确的同步,或重新设计您的应用程序,以便不再需要静态变量。

The problem may be related to multithreading. 问题可能与多线程有关。

  1. A thread enters the function 线程进入该功能
  2. Checks the hidden "is_initialized" static variable to see if initialization has already been performed 检查隐藏的“is_initialized”静态变量以查看是否已执行初始化
  3. The var is 0, so it sets the variable to 1 and proceeds reading the registry var为0,因此它将变量设置为1并继续读取注册表
  4. At this point another thread enters the function 此时另一个线程进入该功能
  5. The second thread sees the variables as already initialized and skips the initialization code 第二个线程将变量视为已初始化并跳过初始化代码
  6. The division is performed when the denominator is still 0 (the first thread is still reading the registry) 当分母仍为0(第一个线程仍在读取注册表)时执行除法
  7. The program crashes, but in the meanwhile the first thread completes execution, setting the variables that you see in the dump. 程序崩溃,但同时第一个线程完成执行,设置您在转储中看到的变量。
  8. You lose sleep thinking how the impossible happened 你会忘记不可能发生的事情

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM