[英]Double-check locking issues, c++
I left the rest of implementation for simplicity because it is not relevant here. 为了简单起见,我离开了其余的实现,因为它与此无关。 Consider the classical implemetation of Double-check loking descibed in Modern C++ Design .
考虑在现代C ++设计中使用的双重检查的经典实现。
Singleton& Singleton::Instance()
{
if(!pInstance_)
{
Guard myGuard(lock_);
if (!pInstance_)
{
pInstance_ = new Singleton;
}
}
return *pInstance_;
}
Here the author insists that we avoid the race condition. 在这里,作者坚持认为我们避免了竞争条件。 But I have read an article, which unfortunately I dont remeber very well, in which the following flow was described.
但是我读过一篇文章,不幸的是我记得很清楚,其中描述了以下流程。
In that article the author stated then the trick is that on the line pInstance_ = new Singleton;
在那篇文章中,作者说过,诀窍在于行
pInstance_ = new Singleton;
the memory can be allocated, assigned to pInstance that the constructor will be called on that memory. 可以分配内存,将其分配给pInstance,以便在该内存上调用构造函数。
Relying to standard or other reliable sources, can anyone please confirm or deny the probability or correctness of this flow? 依赖标准或其他可靠来源,任何人都可以确认或否认此流程的可能性或正确性吗? Thanks!
谢谢!
The problem you describe can only occur if for reasons I cannot imagine the conceptors of the singleton uses an explicit (and broken) 2 steps construction: 你描述的问题只有在我无法想象单身人士的概念使用明确(和破坏)2步构造的原因时才会发生:
...
Guard myGuard(lock_);
if (!pInstance_)
{
auto alloc = std::allocator<Singleton>();
pInstance_ = alloc.allocate(); // SHAME here: race condition
// eventually other stuff
alloc.construct(_pInstance); // anything could have happened since allocation
}
....
Even if for any reason such a 2 step construction was required, the _pInstance
member shall never contain anything else that nullptr
or a fully constructed instance: 即使由于任何原因需要这样的两步构造,
_pInstance
成员也不应包含nullptr
或完全构造的实例的任何其他内容:
auto alloc = std::allocator<Singleton>();
Singleton *tmp = alloc.allocate(); // no problem here
// eventually other stuff
alloc.construct(tmp); // nor here
_pInstance = tmp; // a fully constructed instance
But beware : the fix is only guaranteed on a mono CPU. 但要注意 :修复只能在单CPU上保证。 Things could be much worse on multi core systems where the C++11 atomics semantics are indeed required.
在确实需要C ++ 11原子语义的多核系统上情况会更糟。
The problem is that in the absence of guarantees otherwise, the store of the pointer into pInstance_
might be seen by some other thread before the construction of the object is complete. 问题是,在没有保证的情况下,在完成对象构造之前,某些其他线程可能会看到指向
pInstance_
的指针存储。 In that case, the other thread won't enter the mutex and will just immediately return pInstance_
and when the caller uses it, it can see uninitialized values. 在这种情况下,另一个线程不会进入互斥锁,只会立即返回
pInstance_
,当调用者使用它时,它可以看到未初始化的值。
This apparent reordering between the store(s) associated with the construction on Singleton
and the store to pInstance_
may be caused by compiler or the hardware. 与
Singleton
上的构造相关联的存储与存储到pInstance_
之间的这种明显的重新排序可能是由编译器或硬件引起的。 I'll take a quick look at both cases below. 我将快速浏览下面的两个案例。
Absent any specific guarantees guarantees related to concurrent reads (such as those offered by C++11's std::atomic
objects) the compiler only needs to preserve the semantics of the code as seen by the current thread . 如果没有与并发读取相关的任何特定保证保证(例如C ++ 11的
std::atomic
对象提供的保证),编译器只需要保留当前线程所看到的代码语义。 That means, for example, that it may compile code "out of order" to how it appears in the source, as long as this doesn't have visible side-effects (as defined by the standard) on the current thread. 这意味着,例如,它可以将代码“乱序”编译为它在源中的显示方式,只要它在当前线程上没有可见的副作用(由标准定义)。
In particular, it would not be uncommon for the compiler to reorder stores performed in the constructor for Singleton
, with the store to pInstance_
, as long as it can see that the effect is the same 1 . 特别是,编译器重新排序在
Singleton
的构造函数中执行的存储, pInstance_
存储设置为pInstance_
,只要它可以看到效果相同1就不常见了。
Let's take a look at a fleshed out version of your example: 让我们来看看你的例子的一个充实版本:
struct Lock {};
struct Guard {
Guard(Lock& l);
};
int value;
struct Singleton {
int x;
Singleton() : x{value} {}
static Lock lock_;
static Singleton* pInstance_;
static Singleton& Instance();
};
Singleton& Singleton::Instance()
{
if(!pInstance_)
{
Guard myGuard(lock_);
if (!pInstance_)
{
pInstance_ = new Singleton;
}
}
return *pInstance_;
}
Here, the constructor for Singleton
is very simple: it simply reads from the global value
and assigns it to the x
, the only member of Singleton
. 这里,
Singleton
的构造函数非常简单:它只是从全局value
读取并将其value
给x
,这是Singleton
的唯一成员。
Using godbolt, we can check exactly how gcc and clang compile this . 使用godbolt, 我们可以确切地检查gcc和clang如何编译它 。 The gcc version, annotated, is shown below:
注释的gcc版本如下所示:
Singleton::Instance():
mov rax, QWORD PTR Singleton::pInstance_[rip]
test rax, rax
jz .L9 ; if pInstance != NULL, go to L9
ret
.L9:
sub rsp, 24
mov esi, OFFSET FLAT:_ZN9Singleton5lock_E
lea rdi, [rsp+15]
call Guard::Guard(Lock&) ; acquire the mutex
mov rax, QWORD PTR Singleton::pInstance_[rip]
test rax, rax
jz .L10 ; second check for null, if still null goto L10
.L1:
add rsp, 24
ret
.L10:
mov edi, 4
call operator new(unsigned long) ; allocate memory (pointer in rax)
mov edx, DWORD value[rip] ; load value global
mov QWORD pInstance_[rip], rax ; store pInstance pointer!!
mov DWORD [rax], edx ; store value into pInstance_->x
jmp .L1
The last few lines are critical, in particular the two stores: 最后几行很关键,特别是两家商店:
mov QWORD pInstance_[rip], rax ; store pInstance pointer!!
mov DWORD [rax], edx ; store value into pInstance_->x
Effectively, the line pInstance_ = new Singleton;
有效地,行
pInstance_ = new Singleton;
been transformed into: 被转化为:
Singleton* stemp = operator new(sizeof(Singleton)); // (1) allocate uninitalized memory for a Singleton object on the heap
int vtemp = value; // (2) read global variable value
pInstance_ = stemp; // (3) write the pointer, still uninitalized, into the global pInstance (oops!)
pInstance_->x = vtemp; // (4) initialize the Singleton by writing x
Oops! 哎呀! Any second thread arriving when (3) has occurred, but (4) hasn't, will see a non-null
pInstance_
, but then read an uninitialized (garbage) value for pInstance->x
. 任何第二个线程在(3)发生时到达,但(4)没有,将看到非空
pInstance_
,但随后读取pInstance->x
的未初始化(垃圾)值。
So even without invoking any weird hardware reordering at all, this pattern isn't safe without doing more work. 因此,即使没有调用任何奇怪的硬件重新排序,如果不做更多工作,这种模式也是不安全的。
Let's say you organize so that the reordering of the stores above doesn't occur on your compiler 2 , perhaps by putting a compiler barrier such as asm volatile ("" ::: "memory")
. 假设您进行组织,以便在编译器2上不会发生上述存储的重新排序,可能是通过设置编译器屏障,例如
asm volatile ("" ::: "memory")
。 With that small change , gcc now compiles this to have the two critical stores in the "desired" order: 通过这个小小的改变 ,gcc现在编译它以使两个关键商店处于“期望”的顺序:
mov DWORD PTR [rax], edx
mov QWORD PTR Singleton::pInstance_[rip], rax
So we're good, right? 所以我们很好,对吧?
Well on x86, we are. 在x86上,我们是。 It happens that x86 has a relatively strong memory model, and all stores already have release semantics .
碰巧x86具有相对强大的内存模型,并且所有商店都已经具有发布语义 。 I won't describe the full semantics, but in the context of two stores as above, it implies that stores appear in program order to other CPUs: so any CPU that sees the second write above (to
pInstance_
) will necessarily see the prior write (to pInstance_->x
). 我不会描述完整的语义,但是在上面两个存储的上下文中,它意味着存储按程序顺序出现在其他CPU上:因此任何看到上面第二次写入的CPU(对于
pInstance_
)都必然会看到先前的写入(对于pInstance_->x
)。
We can illustrate that by using the C++11 std::atomic
feature to explicitly ask for a release store for pInstance_
(this also enables us to get rid of the compiler barrier): 我们可以通过使用C ++ 11
std::atomic
特性来明确地请求pInstance_
的发布存储(这也使我们能够摆脱编译器障碍):
static std::atomic<Singleton*> pInstance_;
...
if (!pInstance_)
{
pInstance_.store(new Singleton, std::memory_order_release);
}
We get reasonable assembly with no hardware memory barriers or anything (there is a redundant load now, but this is both a missed-optimization by gcc and a consequence of the way we wrote the function). 我们得到合理的汇编 ,没有硬件内存障碍或任何东西(现在有一个冗余的负载,但这是gcc的错过优化和我们编写函数的方式的结果)。
So we're done, right? 所以我们完成了,对吧?
Nope - most other platforms don't have the strong store-to-store ordering that x86 does. 不 - 大多数其他平台没有x86所做的强大的商店到商店订购。
Let's take a look at ARM64 assembly around the creation of the new object: 让我们看一下围绕创建新对象的ARM64程序集 :
bl operator new(unsigned long)
mov x1, x0 ; x1 holds Singleton* temp
adrp x0, .LANCHOR0
ldr w0, [x0, #:lo12:.LANCHOR0] ; load value
str w0, [x1] ; temp->x = value
mov x0, x1
str x1, [x19, #pInstance_] ; pInstance_ = temp
So we have the str
to pInstance_
as the last store, coming after the temp->x = value
store, as we want. 因此,我们将
str
作为最后一个商店的pInstance_
,在pInstance_
temp->x = value
存储之后,如我们所愿。 However, the ARM64 memory model doesn't guarantee that these stores appear in program order when observed by another CPU. 但是,ARM64内存模型不保证这些存储在由另一个CPU观察时按程序顺序出现。 So even though we've tamed the compiler, the hardware can still trip us up.
因此,即使我们已经驯服了编译器,硬件仍然会让我们失望。 You'll need a barrier to solve this.
你需要一个障碍来解决这个问题。
Prior to C++11, there wasn't a portable solution for this problem. 在C ++ 11之前,没有针对此问题的可移植解决方案。 For a particular ISA you could use inline assembly to emit the right barrier.
对于特定的ISA,您可以使用内联汇编来发出正确的障碍。 Your compiler might have a builtin like
__sync_synchronize
offered by gcc
, or your OS might even have something . 你的编译器可能有像
gcc
提供的__sync_synchronize
这样的内置__sync_synchronize
,或者你的操作系统甚至可能有东西 。
In C++11 and beyond, however, we finally have a formal memory model built-in to the language, and what we need there, for doubled check locking is a release store, as the final store to pInstance_
. 然而,在C ++ 11及更高版本中,我们最终有一个内置于该语言的正式内存模型,而我们需要的是,双重检查锁定是一个发布存储,作为
pInstance_
的最终存储。 We saw this already for x86 where we checked that no compiler barrier was emitted, using std::atomic
with memory_order_release
the object publishing code becomes : 我们已经在x86中看到了这一点,我们检查了没有发出编译器障碍,使用带有
memory_order_release
的std::atomic
,对象发布代码变为 :
bl operator new(unsigned long)
adrp x1, .LANCHOR0
ldr w1, [x1, #:lo12:.LANCHOR0]
str w1, [x0]
stlr x0, [x20]
The main difference is the final store is now stlr
- a release store . 最主要的区别是最终商店现在是
stlr
- 一个发布商店 。 You can check out the PowerPC side too, where an lwsync
barrier has popped up between the two stores. 您也可以查看PowerPC方面,两个商店之间出现了
lwsync
障碍。
So the bottom line is that: 所以底线是:
std::atomic
with memory_order_acquire
loads and memory_order_release
stores. std::atomic
与memory_order_acquire
加载和memory_order_release
存储一起使用。 The above only covered half of the problem: the store of pInstance_
. 以上只涉及问题的一半:
pInstance_
的存储 。 The other half that can go wrong is the load, and the load is actually the most important for performance, since it represents the usual fast-path that gets taken after the singleton is initialized. 可能出错的另一半是负载,负载实际上对性能最重要,因为它代表了在单例初始化之后采用的通常的快速路径。 What if the
pInstance_->x
was loaded before pInstance
itself was loaded and checked for null? 如果在加载
pInstance
本身并检查为null之前加载了pInstance_->x
怎么办? In that case, you could still read an uninitialized value! 在这种情况下,您仍然可以读取未初始化的值!
This might seem unlikely, since pInstance_
needs to be loaded before it is deferenced, right? 这似乎不太可能,因为
pInstance_
需要在它被pInstance_
之前加载,对吧? That is, there seems to be a fundamental dependency between the operations that prevents reordering, unlike the store case. 也就是说,与商店案例不同,似乎存在阻止重新排序的操作之间的基本依赖关系。 Well, as it turns out, both hardware behavior and software transformation could still trip you up here, and the details are even more complex than the store case.
好吧,事实证明,硬件行为和软件转换仍然可能会让你失望,而且细节甚至比商店案例更复杂。 If you use
memory_order_acquire
though, you'll be fine. 如果你使用
memory_order_acquire
,你会没事的。 If you want the last once of performance, especially on PowerPC, you'll need to dig into the details of memory_order_consume
. 如果你想要最后一次性能,特别是在PowerPC上,你需要深入了解
memory_order_consume
的细节。 A tale for another day. 另一天的故事。
1 In particular, this means that the compiler has to be able to see the code for the constructor Singleton()
so that it can determine that it doesn't read from pInstance_
. 1特别是,这意味着编译器必须能够查看构造函数
Singleton()
的代码,以便它可以确定它不从pInstance_
读取。
2 Of course, it's very dangerous to rely on this since you'd have to check the assembly after every compilation if anything changed! 2当然,依赖于此是非常危险的,因为如果有任何改变,你必须在每次编译后检查程序集!
It used to be unspecified before C++11, because there was no standard memory model discussing multiple threads. 它曾经在C ++ 11之前未指定,因为没有标准内存模型讨论多个线程。
IIRC the pointer could have been set to the allocated address before the constructor completed so long as that thread would never be able to tell the difference (this could probably only happen for a trivial/non-throwing constructor). IIRC指针可以在构造函数完成之前设置为已分配的地址,只要该线程永远无法区分(这可能只发生在一个简单/非抛出的构造函数中)。
Since C++11, the sequenced-before rules disallow that reordering, specifically 从C ++ 11开始, 序列之前的规则不允许重新排序,具体而言
8) The side effect (modification of the left argument) of the built-in assignment operator ... is sequenced after the value computation ... of both left and right arguments, ...
8)内置赋值运算符的副作用(左参数的修改)在左右参数的值计算...之后排序,...
Since the right argument is a new-expression, that must have completed allocation & construction before the left-hand-side can be modified. 由于右参数是一个新表达式,因此必须先完成分配和构造,然后才能修改左侧。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.