VC ++仍然按顺序 - 一致吗？

Question

I watched (most of) Herb Sutter's the atmoic<> weapons video , and I wanted to test the "conditional lock" with a loop inside sample. 我看了（大部分） Herb Sutter的atmoic <>武器视频，我想用样本中的循环来测试“条件锁定”。 Apparently, although (if I understand correctly) the C++11 standard says the below example should work properly and be sequentially consistent, it is not. 显然，虽然（如果我理解正确的话）C ++ 11标准说下面的例子应该正常工作并且顺序一致，但事实并非如此。

Before you read on, my question is: Is this correct? 在您继续阅读之前，我的问题是：这是正确的吗？ Is the compiler broken? 编译器坏了吗？ Is my code broken - do I have a race condition here which I missed? 我的代码是否被破坏 - 我在这里遇到了一个我错过的竞争条件吗？ How do I bypass this? 我该如何绕过这个？

I tried it on 3 different versions of Visual C++: VC10 professional, VC11 professional and VC12 Express (== Visual Studio 2013 Desktop Express). 我尝试了3种不同版本的Visual C ++：VC10专业版，VC11专业版和VC12 Express版（== Visual Studio 2013 Desktop Express）。

Below is the code I used for the Visual Studio 2013. For the other versions I used boost instead of std, but the idea is the same. 下面是我用于Visual Studio 2013的代码。对于其他版本，我使用boost而不是std，但想法是一样的。

#include <iostream>
#include <thread>
#include <mutex>

int a = 0;
std::mutex m;

void other()
{
    std::lock_guard<std::mutex> l(m);
    std::this_thread::sleep_for(std::chrono::milliseconds(2));
    a = 999999;
    std::this_thread::sleep_for(std::chrono::seconds(2));
    std::cout << a << "\n";
}

int main(int argc, char* argv[])
{
    bool work = (argc > 1);

    if (work)
    {
        m.lock();
    }

    std::thread th(other);
    for (int i = 0; i < 100000000; ++i)
    {
        if (i % 7 == 3)
        {
            if (work)
            {
                ++a;
            }
        }
    }

    if (work)
    {
        std::cout << a << "\n";
        m.unlock();
    }

    th.join();
}

To summarize the idea of the code: The global variable a is protected by the global mutex m . 总结代码的概念：全局变量a受全局互斥锁m保护。 Assuming there are no command line arguments ( argc==1 ) the thread which runs other() is the only one which is supposed to access the global variable a . 假设没有命令行参数（ argc==1 ），运行other()的线程是唯一一个应该访问全局变量a的线程。

The correct output of the program is to print 999999. 程序的正确输出是打印999999。

However, because of the compiler loop optimization (using a register for in-loop increments and at the end of the loop copying the value back to a ), a is modified by the assembly even though it's not supposed to. 但是，由于编译器循环优化（使用寄存器进行循环增量，并在循环结束时将值复制回a ），即使它不应该由程序集修改a 。

This happened in all 3 VC versions, although in this code example in VC12 I had to plant some calls to sleep() to make it break. 这发生在所有3个VC版本中，虽然在VC12的这个代码示例中，我不得不调用sleep()来使其中断。

Here's some of the assembly code (the adress of a in this run is 0x00f65498 ): 这是一些汇编代码（此运行中的a的地址是0x00f65498 ）：

Loop initialization - value from a is copied into edi 循环初始化 - 来自a值被复制到edi

    27:     for (int i = 0; i < 100000000; ++i)
00F61543  xor         esi,esi  
00F61545  mov         edi,dword ptr ds:[0F65498h]  
00F6154B  jmp         main+0C0h (0F61550h)  
00F6154D  lea         ecx,[ecx]  
    28:     {
    29:         if (i % 7 == 3)

Increment within the condition, and after the loop copied back to the location of a unconditionally 的条件内递增，并且在循环之后复制回的位置a无条件

    30:         {
    31:             if (work)
00F61572  mov         al,byte ptr [esp+1Bh]  
00F61576  jne         main+0EDh (0F6157Dh)  
00F61578  test        al,al  
00F6157A  je          main+0EDh (0F6157Dh)  
    32:             {
    33:                 ++a;
00F6157C  inc         edi  
    27:     for (int i = 0; i < 100000000; ++i)
00F6157D  inc         esi  
00F6157E  cmp         esi,5F5E100h  
00F61584  jl          main+0C0h (0F61550h)  
    32:             {
    33:                 ++a;
00F61586  mov         dword ptr ds:[0F65498h],edi  
    34:             }

And the output of the program is 0 . 并且程序的输出为0 。

Answer 1

The 'volatile' keyword will prevent that kind of optimization. 'volatile'关键字将阻止这种优化。 That's exactly what it's for: every use of 'a' will be read or written exactly as shown, and won't be moved in a different order to other volatile variables. 这正是它的用途：'a'的每次使用都将完全按照所示的方式读取或写入，并且不会以不同的顺序移动到其他volatile变量。

The implementation of the mutex should include compiler-specific instructions to cause a "fence" at that point, telling the optimizer not to reorder instructions across that boundary. 互斥锁的实现应该包括特定于编译器的指令，以便在该点引起“围栏”，告诉优化器不要跨越该边界重新排序指令。 Since the implementation is not from the compiler vendor, maybe that's left out? 由于实现不是来自编译器供应商，可能是遗漏了？ I've never checked. 我从来没有检查过。

Since 'a' is global, I would generally think the compiler would be more careful with it. 由于'a'是全局的，我通常会认为编译器会更加小心。 But, VS10 doesn't know about threads so it won't consider that other threads will use it. 但是，VS10不了解线程，所以不会考虑其他线程会使用它。 Since the optimizer grasps the entire loop execution, it knows that functions called from within the loop won't touch 'a' and that's enough for it. 由于优化器掌握了整个循环执行，它知道从循环内调用的函数不会触及'a'，这就足够了。

I'm not sure what the new standard says about thread visibility of global variables other than volatile. 我不确定新标准对于除volatile之外的全局变量的线程可见性的说法。 That is, is there a rule that would prevent that optimization (even though the function can be grasped all the way down so it knows other functions don't use the global, must it assume that other threads can) ? 也就是说，是否存在一个可以阻止优化的规则（即使该函数可以一直向下掌握，因此它知道其他函数不使用全局，它是否必须假设其他线程可以）？

I suggest trying the newer compiler with the compiler-provided std::mutex, and checking what the C++ standard and current drafts say about that. 我建议使用编译器提供的std :: mutex来尝试更新的编译器，并检查C ++标准和当前草案的内容。 I think the above should help you know what to look for. 我认为以上内容可以帮助您了解要寻找什么。

—John -约翰

Answer 2

Almost a month later, Microsoft still hasn't responded to the bug in MSDN Connect . 差不多一个月后，微软仍未对MSDN Connect中的错误做出回应。

To summarize the above comments (and some further tests), apparently it happens in VS2013 professional as well, but the bug only happens when building for Win32, not for x64. 总结一下上面的评论（以及一些进一步的测试），显然它也发生在VS2013专业版中，但是这个bug只发生在为Win32而不是x64构建时。 The generated assembly code in x64 doesn't have this problem. x64中生成的汇编代码没有此问题。 So it appears that it is a bug in the optimizer, and that there's no race condition in this code. 所以它似乎是优化器中的一个错误，并且此代码中没有竞争条件。

Apparently this bug also happens in GCC 4.8.1, but not in GCC 4.9. 显然这个错误也发生在GCC 4.8.1中，但不是在GCC 4.9中。 (Thanks to Voo , nosid and Chris Dodd for all their testing). （感谢Voo ， nosid和Chris Dodd的所有测试）。

It was suggested to mark a as volatile . 有人建议，以纪念a为volatile 。 This indeed prevents the bug, but only because it prevents the optimizer from performing the loop register optimization. 这确实可以防止错误，但这只是因为它阻止优化器执行循环寄存器优化。

I found another solution: Add another local variable b , and if needed (and under lock) do the following: 我找到了另一个解决方案：添加另一个局部变量b ，如果需要（并在锁定下），请执行以下操作：

Copy a into b 将a复制到b
Increment b in the loop 循环中增加b
Copy back to a if needed 如果需要，复制回a

The optimizer replaces the local variable with a register, so the code is still optimized, but the copies from and to a are done only if needed, and under lock. 优化取代了局部变量与寄存器，所以代码仍优化，但往返于拷贝a ，如果需要的只是完成，下锁。

Here's the new main() code, with arrows marking the changed lines. 这是新的main()代码，箭头标记更改的行。

int main(int argc, char* argv[])
{
    bool work = (argc == 1);

    int b = 0;          // <----

    if (work)
    {
        m.lock();
        b = a;          // <----
    }

    std::thread th(other);
    for (int i = 0; i < 100000000; ++i)
    {
        if (i % 7 == 3)
        {
            if (work)
            {
                ++b;    // <----
            }
        }
    }

    if (work)
    {
        a = b;          // <----
        std::cout << a << "\n";
        m.unlock();
    }

    th.join();
}

And this is what the assembly code looks like ( &a == 0x000744b0 , b replaced with edi ): 这就是汇编代码的样子（ &a == 0x000744b0 ， b替换为edi ）：

    21:     int b = 0;
00071473  xor         edi,edi  
    22: 
    23:     if (work)
00071475  test        bl,bl  
00071477  je          main+5Bh (07149Bh)  
    24:     {
    25:         m.lock();

         ........

00071492  add         esp,4  
    26:         b = a;
00071495  mov         edi,dword ptr ds:[744B0h]  
    27:     }
    28: 

         ........

    33:         {
    34:             if (work)
00071504  test        bl,bl  
00071506  je          main+0C9h (071509h)  
    35:             {
    36:                 ++b;
00071508  inc         edi  
    30:     for (int i = 0; i < 100000000; ++i)
00071509  inc         esi  
0007150A  cmp         esi,5F5E100h  
00071510  jl          main+0A0h (0714E0h)  
    37:             }
    38:         }
    39:     }
    40: 
    41:     if (work)
00071512  test        bl,bl  
00071514  je          main+10Ch (07154Ch)  
    42:     {
    43:         a = b;
    44:        std::cout << a << "\n";
00071516  mov         ecx,dword ptr ds:[73084h]  
0007151C  push        edi  
0007151D  mov         dword ptr ds:[744B0h],edi  
00071523  call        dword ptr ds:[73070h]  
00071529  mov         ecx,eax  
0007152B  call        std::operator<<<std::char_traits<char> > (071A80h)  

     ........

This keeps the optimization and solves (or works around) the problem. 这样可以保持优化并解决（或解决）问题。

VC ++仍然按顺序 - 一致吗？

问题描述

2 个解决方案

解决方案1
0 2014-06-24 04:49:54

解决方案2
0 已采纳 2014-07-15 18:47:32

VC ++仍然按顺序 - 一致吗？

问题描述

2 个解决方案

解决方案1 0 2014-06-24 04:49:54

解决方案2 0 已采纳 2014-07-15 18:47:32

解决方案1
0 2014-06-24 04:49:54

解决方案2
0 已采纳 2014-07-15 18:47:32