仔细检查锁定问题，c ++

Question

I left the rest of implementation for simplicity because it is not relevant here. 为了简单起见，我离开了其余的实现，因为它与此无关。 Consider the classical implemetation of Double-check loking descibed in Modern C++ Design . 考虑在现代C ++设计中使用的双重检查的经典实现。

Singleton& Singleton::Instance()
{
    if(!pInstance_) 
    { 
         Guard myGuard(lock_); 
         if (!pInstance_) 
         {
            pInstance_ = new Singleton; 
         }
     }
     return *pInstance_;
}

Here the author insists that we avoid the race condition. 在这里，作者坚持认为我们避免了竞争条件。 But I have read an article, which unfortunately I dont remeber very well, in which the following flow was described. 但是我读过一篇文章，不幸的是我记得很清楚，其中描述了以下流程。

Thread 1 enters first if statement 线程1首先输入if语句
Thread 1 enters the mutex end get in the second if body. 线程1进入互斥体端进入第二个if体。
Thread 1 calls operator new and assigns memory to pInstance than calls a constructor on that memory; 线程1调用operator new并将内存分配给pInstance，而不是调用该内存上的构造函数;
Suppose the thread 1 assigned the memory to pInstance but not created the object and thread 2 enters the function. 假设线程1将内存分配给pInstance但未创建对象，线程2进入该功能。
Thread 2 see that the pInstance is not null (but yet not initialized with constructor) and returns the pInstance. 线程2看到pInstance不为null（但尚未使用构造函数初始化）并返回pInstance。

In that article the author stated then the trick is that on the line pInstance_ = new Singleton; 在那篇文章中，作者说过，诀窍在于行pInstance_ = new Singleton; the memory can be allocated, assigned to pInstance that the constructor will be called on that memory. 可以分配内存，将其分配给pInstance，以便在该内存上调用构造函数。

Relying to standard or other reliable sources, can anyone please confirm or deny the probability or correctness of this flow? 依赖标准或其他可靠来源，任何人都可以确认或否认此流程的可能性或正确性吗？ Thanks! 谢谢！

Answer 1

The problem you describe can only occur if for reasons I cannot imagine the conceptors of the singleton uses an explicit (and broken) 2 steps construction: 你描述的问题只有在我无法想象单身人士的概念使用明确（和破坏）2步构造的原因时才会发生：

     ...
     Guard myGuard(lock_); 
     if (!pInstance_) 
     {
        auto alloc = std::allocator<Singleton>();
        pInstance_ = alloc.allocate(); // SHAME here: race condition
        // eventually other stuff
        alloc.construct(_pInstance);   // anything could have happened since allocation
     }
     ....

Even if for any reason such a 2 step construction was required, the _pInstance member shall never contain anything else that nullptr or a fully constructed instance: 即使由于任何原因需要这样的两步构造， _pInstance成员也不应包含nullptr或完全构造的实例的任何其他内容：

        auto alloc = std::allocator<Singleton>();
        Singleton *tmp = alloc.allocate(); // no problem here
        // eventually other stuff
        alloc.construct(tmp);              // nor here
        _pInstance = tmp;                  // a fully constructed instance

But beware : the fix is only guaranteed on a mono CPU. 但要注意 ：修复只能在单CPU上保证。 Things could be much worse on multi core systems where the C++11 atomics semantics are indeed required. 在确实需要C ++ 11原子语义的多核系统上情况会更糟。

Answer 2

The problem is that in the absence of guarantees otherwise, the store of the pointer into pInstance_ might be seen by some other thread before the construction of the object is complete. 问题是，在没有保证的情况下，在完成对象构造之前，某些其他线程可能会看到指向pInstance_的指针存储。 In that case, the other thread won't enter the mutex and will just immediately return pInstance_ and when the caller uses it, it can see uninitialized values. 在这种情况下，另一个线程不会进入互斥锁，只会立即返回pInstance_ ，当调用者使用它时，它可以看到未初始化的值。

This apparent reordering between the store(s) associated with the construction on Singleton and the store to pInstance_ may be caused by compiler or the hardware. 与Singleton上的构造相关联的存储与存储到pInstance_之间的这种明显的重新排序可能是由编译器或硬件引起的。 I'll take a quick look at both cases below. 我将快速浏览下面的两个案例。

Compiler Reordering 编译器重新排序

Absent any specific guarantees guarantees related to concurrent reads (such as those offered by C++11's std::atomic objects) the compiler only needs to preserve the semantics of the code as seen by the current thread . 如果没有与并发读取相关的任何特定保证保证（例如C ++ 11的std::atomic对象提供的保证），编译器只需要保留当前线程所看到的代码语义。 That means, for example, that it may compile code "out of order" to how it appears in the source, as long as this doesn't have visible side-effects (as defined by the standard) on the current thread. 这意味着，例如，它可以将代码“乱序”编译为它在源中的显示方式，只要它在当前线程上没有可见的副作用（由标准定义）。

In particular, it would not be uncommon for the compiler to reorder stores performed in the constructor for Singleton , with the store to pInstance_ , as long as it can see that the effect is the same ¹ . 特别是，编译器重新排序在Singleton的构造函数中执行的存储， pInstance_存储设置为pInstance_ ，只要它可以看到效果相同¹就不常见了。

Let's take a look at a fleshed out version of your example: 让我们来看看你的例子的一个充实版本：

struct Lock {};
struct Guard {
    Guard(Lock& l);
};

int value;

struct Singleton {
    int x;
    Singleton() : x{value} {}

    static Lock lock_;
    static Singleton* pInstance_;
    static Singleton& Instance();
};

Singleton& Singleton::Instance()
{
    if(!pInstance_) 
    { 
         Guard myGuard(lock_); 
         if (!pInstance_) 
         {
            pInstance_ = new Singleton; 
         }
     }
     return *pInstance_;
}

Here, the constructor for Singleton is very simple: it simply reads from the global value and assigns it to the x , the only member of Singleton . 这里， Singleton的构造函数非常简单：它只是从全局value读取并将其value给x ，这是Singleton的唯一成员。

Using godbolt, we can check exactly how gcc and clang compile this . 使用godbolt，我们可以确切地检查gcc和clang如何编译它。 The gcc version, annotated, is shown below: 注释的gcc版本如下所示：

Singleton::Instance():
        mov     rax, QWORD PTR Singleton::pInstance_[rip]
        test    rax, rax
        jz      .L9       ; if pInstance != NULL, go to L9
        ret
.L9:
        sub     rsp, 24
        mov     esi, OFFSET FLAT:_ZN9Singleton5lock_E
        lea     rdi, [rsp+15]
        call    Guard::Guard(Lock&) ; acquire the mutex
        mov     rax, QWORD PTR Singleton::pInstance_[rip]
        test    rax, rax
        jz      .L10     ; second check for null, if still null goto L10
.L1:
        add     rsp, 24
        ret
.L10:
        mov     edi, 4
        call    operator new(unsigned long) ; allocate memory (pointer in rax)
        mov     edx, DWORD value[rip]       ; load value global
        mov     QWORD pInstance_[rip], rax  ; store pInstance pointer!!
        mov     DWORD [rax], edx            ; store value into pInstance_->x
        jmp     .L1

The last few lines are critical, in particular the two stores: 最后几行很关键，特别是两家商店：

        mov     QWORD pInstance_[rip], rax  ; store pInstance pointer!!
        mov     DWORD [rax], edx            ; store value into pInstance_->x

Effectively, the line pInstance_ = new Singleton; 有效地，行pInstance_ = new Singleton; been transformed into: 被转化为：

Singleton* stemp = operator new(sizeof(Singleton)); // (1) allocate uninitalized memory for a Singleton object on the heap
int vtemp     = value; // (2) read global variable value
pInstance_    = stemp; // (3) write the pointer, still uninitalized, into the global pInstance (oops!)
pInstance_->x = vtemp; // (4) initialize the Singleton by writing x

Oops! 哎呀！ Any second thread arriving when (3) has occurred, but (4) hasn't, will see a non-null pInstance_ , but then read an uninitialized (garbage) value for pInstance->x . 任何第二个线程在（3）发生时到达，但（4）没有，将看到非空pInstance_ ，但随后读取pInstance->x的未初始化（垃圾）值。

So even without invoking any weird hardware reordering at all, this pattern isn't safe without doing more work. 因此，即使没有调用任何奇怪的硬件重新排序，如果不做更多工作，这种模式也是不安全的。

Hardware Reordering 硬件重新排序

Let's say you organize so that the reordering of the stores above doesn't occur on your compiler ² , perhaps by putting a compiler barrier such as asm volatile ("" ::: "memory") . 假设您进行组织，以便在编译器²上不会发生上述存储的重新排序，可能是通过设置编译器屏障，例如asm volatile ("" ::: "memory") 。 With that small change , gcc now compiles this to have the two critical stores in the "desired" order: 通过这个小小的改变，gcc现在编译它以使两个关键商店处于“期望”的顺序：

        mov     DWORD PTR [rax], edx
        mov     QWORD PTR Singleton::pInstance_[rip], rax

So we're good, right? 所以我们很好，对吧？

Well on x86, we are. 在x86上，我们是。 It happens that x86 has a relatively strong memory model, and all stores already have release semantics . 碰巧x86具有相对强大的内存模型，并且所有商店都已经具有发布语义。 I won't describe the full semantics, but in the context of two stores as above, it implies that stores appear in program order to other CPUs: so any CPU that sees the second write above (to pInstance_ ) will necessarily see the prior write (to pInstance_->x ). 我不会描述完整的语义，但是在上面两个存储的上下文中，它意味着存储按程序顺序出现在其他CPU上：因此任何看到上面第二次写入的CPU（对于pInstance_ ）都必然会看到先前的写入（对于pInstance_->x ）。

We can illustrate that by using the C++11 std::atomic feature to explicitly ask for a release store for pInstance_ (this also enables us to get rid of the compiler barrier): 我们可以通过使用C ++ 11 std::atomic特性来明确地请求pInstance_的发布存储（这也使我们能够摆脱编译器障碍）：

    static std::atomic<Singleton*> pInstance_;
    ...
       if (!pInstance_) 
       {
          pInstance_.store(new Singleton, std::memory_order_release); 
       }

We get reasonable assembly with no hardware memory barriers or anything (there is a redundant load now, but this is both a missed-optimization by gcc and a consequence of the way we wrote the function). 我们得到合理的汇编，没有硬件内存障碍或任何东西（现在有一个冗余的负载，但这是gcc的错过优化和我们编写函数的方式的结果）。

So we're done, right? 所以我们完成了，对吧？

Nope - most other platforms don't have the strong store-to-store ordering that x86 does. 不 - 大多数其他平台没有x86所做的强大的商店到商店订购。

Let's take a look at ARM64 assembly around the creation of the new object: 让我们看一下围绕创建新对象的ARM64程序集：

    bl      operator new(unsigned long)
    mov     x1, x0                         ; x1 holds Singleton* temp
    adrp    x0, .LANCHOR0
    ldr     w0, [x0, #:lo12:.LANCHOR0]     ; load value
    str     w0, [x1]                       ; temp->x = value
    mov     x0, x1
    str     x1, [x19, #pInstance_]  ; pInstance_ = temp

So we have the str to pInstance_ as the last store, coming after the temp->x = value store, as we want. 因此，我们将str作为最后一个商店的pInstance_ ，在pInstance_ temp->x = value存储之后，如我们所愿。 However, the ARM64 memory model doesn't guarantee that these stores appear in program order when observed by another CPU. 但是，ARM64内存模型不保证这些存储在由另一个CPU观察时按程序顺序出现。 So even though we've tamed the compiler, the hardware can still trip us up. 因此，即使我们已经驯服了编译器，硬件仍然会让我们失望。 You'll need a barrier to solve this. 你需要一个障碍来解决这个问题。

Prior to C++11, there wasn't a portable solution for this problem. 在C ++ 11之前，没有针对此问题的可移植解决方案。 For a particular ISA you could use inline assembly to emit the right barrier. 对于特定的ISA，您可以使用内联汇编来发出正确的障碍。 Your compiler might have a builtin like __sync_synchronize offered by gcc , or your OS might even have something . 你的编译器可能有像gcc提供的__sync_synchronize这样的内置__sync_synchronize ，或者你的操作系统甚至可能有东西。

In C++11 and beyond, however, we finally have a formal memory model built-in to the language, and what we need there, for doubled check locking is a release store, as the final store to pInstance_ . 然而，在C ++ 11及更高版本中，我们最终有一个内置于该语言的正式内存模型，而我们需要的是，双重检查锁定是一个发布存储，作为pInstance_的最终存储。 We saw this already for x86 where we checked that no compiler barrier was emitted, using std::atomic with memory_order_release the object publishing code becomes : 我们已经在x86中看到了这一点，我们检查了没有发出编译器障碍，使用带有memory_order_release的std::atomic ，对象发布代码变为：

    bl      operator new(unsigned long)
    adrp    x1, .LANCHOR0
    ldr     w1, [x1, #:lo12:.LANCHOR0]
    str     w1, [x0]
    stlr    x0, [x20]

The main difference is the final store is now stlr - a release store . 最主要的区别是最终商店现在是stlr - 一个发布商店。 You can check out the PowerPC side too, where an lwsync barrier has popped up between the two stores. 您也可以查看PowerPC方面，两个商店之间出现了lwsync障碍。

So the bottom line is that: 所以底线是：

Double checked locking is safe in a sequentially consistent system. 双重检查锁定在顺序一致的系统中是安全的。
Real-world systems almost always deviate from sequential consistency, either due to the hardware, the compiler or both. 由于硬件，编译器或两者兼而有之，实际系统几乎总是偏离顺序一致性。
To solve that, you need tell the compiler what you want, and it will both avoid reordering itself and emit the necessary barrier instructions, if any, to prevent the hardware from causing a problem. 要解决这个问题，您需要告诉编译器您需要什么，并且它将避免重新排序自身并发出必要的屏障指令（如果有），以防止硬件导致问题。
Prior to C++11, the "way you tell the compiler" to do that was platform/compiler/OS specific, but in C++ you can simply use std::atomic with memory_order_acquire loads and memory_order_release stores. 在C ++ 11之前，“告诉编译器的方式”是指平台/编译器/操作系统特定的，但在C ++中，您可以简单地将std::atomic与memory_order_acquire加载和memory_order_release存储一起使用。

The Load 负载

The above only covered half of the problem: the store of pInstance_ . 以上只涉及问题的一半： pInstance_的存储。 The other half that can go wrong is the load, and the load is actually the most important for performance, since it represents the usual fast-path that gets taken after the singleton is initialized. 可能出错的另一半是负载，负载实际上对性能最重要，因为它代表了在单例初始化之后采用的通常的快速路径。 What if the pInstance_->x was loaded before pInstance itself was loaded and checked for null? 如果在加载pInstance本身并检查为null之前加载了pInstance_->x怎么办？ In that case, you could still read an uninitialized value! 在这种情况下，您仍然可以读取未初始化的值！

This might seem unlikely, since pInstance_ needs to be loaded before it is deferenced, right? 这似乎不太可能，因为pInstance_需要在它被pInstance_ 之前加载，对吧？ That is, there seems to be a fundamental dependency between the operations that prevents reordering, unlike the store case. 也就是说，与商店案例不同，似乎存在阻止重新排序的操作之间的基本依赖关系。 Well, as it turns out, both hardware behavior and software transformation could still trip you up here, and the details are even more complex than the store case. 好吧，事实证明，硬件行为和软件转换仍然可能会让你失望，而且细节甚至比商店案例更复杂。 If you use memory_order_acquire though, you'll be fine. 如果你使用memory_order_acquire ，你会没事的。 If you want the last once of performance, especially on PowerPC, you'll need to dig into the details of memory_order_consume . 如果你想要最后一次性能，特别是在PowerPC上，你需要深入了解memory_order_consume的细节。 A tale for another day. 另一天的故事。

¹ In particular, this means that the compiler has to be able to see the code for the constructor Singleton() so that it can determine that it doesn't read from pInstance_ . ¹特别是，这意味着编译器必须能够查看构造函数Singleton()的代码，以便它可以确定它不从pInstance_读取。

² Of course, it's very dangerous to rely on this since you'd have to check the assembly after every compilation if anything changed! ²当然，依赖于此是非常危险的，因为如果有任何改变，你必须在每次编译后检查程序集！

Answer 3

It used to be unspecified before C++11, because there was no standard memory model discussing multiple threads. 它曾经在C ++ 11之前未指定，因为没有标准内存模型讨论多个线程。

IIRC the pointer could have been set to the allocated address before the constructor completed so long as that thread would never be able to tell the difference (this could probably only happen for a trivial/non-throwing constructor). IIRC指针可以在构造函数完成之前设置为已分配的地址，只要该线程永远无法区分（这可能只发生在一个简单/非抛出的构造函数中）。

Since C++11, the sequenced-before rules disallow that reordering, specifically 从C ++ 11开始，序列之前的规则不允许重新排序，具体而言

8) The side effect (modification of the left argument) of the built-in assignment operator ... is sequenced after the value computation ... of both left and right arguments, ... 8）内置赋值运算符的副作用（左参数的修改）在左右参数的值计算...之后排序，...

Since the right argument is a new-expression, that must have completed allocation & construction before the left-hand-side can be modified. 由于右参数是一个新表达式，因此必须先完成分配和构造，然后才能修改左侧。

仔细检查锁定问题，c ++

问题描述

3 个解决方案

解决方案1
4 已采纳 2018-06-01 11:02:57

解决方案2
3 2018-06-10 01:43:45

Compiler Reordering 编译器重新排序

Hardware Reordering 硬件重新排序

The Load 负载

解决方案3
1 2018-06-01 11:11:19

仔细检查锁定问题，c ++

问题描述

3 个解决方案

解决方案1 4 已采纳 2018-06-01 11:02:57

解决方案2 3 2018-06-10 01:43:45

Compiler Reordering 编译器重新排序

Hardware Reordering 硬件重新排序

The Load 负载

解决方案3 1 2018-06-01 11:11:19

解决方案1
4 已采纳 2018-06-01 11:02:57

解决方案2
3 2018-06-10 01:43:45

解决方案3
1 2018-06-01 11:11:19