简体   繁体   English

C++ 标准:可以将宽松的原子存储提升到互斥锁之上吗?

[英]C++ standard: can relaxed atomic stores be lifted above a mutex lock?

Is there any wording in the standard that guarantees that relaxed stores to atomics won't be lifted above the locking of a mutex?标准中是否有任何措辞可以保证原子的宽松存储不会超过互斥锁的锁定? If not, is there any wording that explicitly says that it's kosher for the compiler or CPU to do so?如果没有,是否有任何措辞明确表示编译器或 CPU 这样做是 kosher 的?

For example, take the following program (which could potentially use acq/rel for foo_has_been_set and avoid the lock, and/or make foo itself atomic. It's written this way to illustrate this question.)例如,以下面的程序为例(它可能对foo_has_been_set使用 acq/rel 并避免锁定,和/或使foo本身成为原子。它是为了说明这个问题而编写的。)

std::mutex mu;
int foo = 0;  // Guarded by mu
std::atomic<bool> foo_has_been_set{false};

void SetFoo() {
  mu.lock();
  foo = 1;
  foo_has_been_set.store(true, std::memory_order_relaxed);
  mu.unlock();
}

void CheckFoo() {
  if (foo_has_been_set.load(std::memory_order_relaxed)) {
    mu.lock();
    assert(foo == 1);
    mu.unlock();
  }
}

Is it possible for CheckFoo to crash in the above program if another thread is calling SetFoo concurrently, or is there some guarantee that the store to foo_has_been_set can't be lifted above the call to mu.lock by the compiler and CPU?如果另一个线程同时调用SetFooCheckFoo是否可能在上述程序中崩溃,或者是否有某种保证无法将foo_has_been_set的存储提升到编译器和 CPU 对mu.lock的调用mu.lock

This is related to an older question , but it's not 100% clear to me that the answer there applies to this.这与一个较旧的问题有关,但我并不是 100% 清楚那里的答案适用于此。 In particular, the counter-example in that question's answer may apply to two concurrent calls to SetFoo , but I'm interested in the case where the compiler knows that there is one call to SetFoo and one call to CheckFoo .特别是,该问题答案中的反例可能适用于对SetFoo两个并发调用,但我对编译器知道有一个对SetFoo调用和对SetFoo一个调用的情况CheckFoo Is that guaranteed to be safe?那能保证安全吗?

I'm looking for specific citations in the standard.我正在寻找标准中的特定引用。

I think I've figured out the particular partial order edges that guarantee the program can't crash.我想我已经找到了保证程序不会崩溃的特定偏序边。 In the answer below I'm referencing version N4659 of the draft standard.在下面的答案中,我引用了标准草案的N4659 版本

The code involved for the writer thread A and reader thread B is:写入线程 A 和读取线程 B 所涉及的代码是:

A1: mu.lock()
A2: foo = 1
A3: foo_has_been_set.store(relaxed)
A4: mu.unlock()

B1: foo_has_been_set.load(relaxed) <-- (stop if false)
B2: mu.lock()
B3: assert(foo == 1)
B4: mu.unlock()

We seek a proof that if B3 executes, then A2 happens before B3, as defined in [intro.races]/10 .我们寻求证明如果 B3 执行,那么 A2 发生在 B3 之前,如[intro.races]/10 中所定义。 By [intro.races]/10.2 , it's sufficient to prove that A2 inter-thread happens before B3.通过[intro.races]/10.2 ,足以证明 A2 线程间发生在 B3 之前。

Because lock and unlock operations on a given mutex happen in a single total order ( [thread.mutex.requirements.mutex]/5 ), we must have either A1 or B2 coming first.因为对给定互斥锁的锁定和解锁操作以单一全序 ( [thread.mutex.requirements.mutex]/5 ) 发生,所以我们必须先有 A1 或 B2。 The two cases:两种情况:

  1. Assume that A1 happens before B2.假设 A1 发生在 B2 之前。 Then by [thread.mutex.class]/1 and [thread.mutex.requirements.mutex]/25 , we know that A4 will synchronize with B2.然后通过[thread.mutex.class]/1[thread.mutex.requirements.mutex]/25 ,我们知道 A4 将与 B2 同步。 Therefore by [intro.races]/9.1 , A4 inter-thread happens before B2.因此,通过[intro.races]/9.1 ,A4 线程间发生在 B2 之前。 Since B2 is sequenced before B3, by [intro.races]/9.3.1 we know that A4 inter-thread happens before B3.由于 B2 在 B3 之前被排序,通过[intro.races]/9.3.1我们知道 A4 线程间发生在 B3 之前。 Since A2 is sequenced before A4, by [intro.races]/9.3.2 , A2 inter-thread happens before B3.由于 A2 在 A4 之前被排序,通过[intro.races]/9.3.2 ,A2 线程间发生在 B3 之前。

  2. Assume that B2 happens before A1.假设 B2 发生在 A1 之前。 Then by the same logic as above, we know that B4 synchronizes with A1.那么按照上面的逻辑,我们知道B4和A1同步了。 So since A1 is sequenced before A3, by [intro.races]/9.3.1 , B4 inter-thread happens before A3.因此,由于 A1 在 A3 之前被排序,通过[intro.races]/9.3.1 ,B4 线程间发生在 A3 之前。 Therefore since B1 is sequenced before B4, by [intro.races]/9.3.2 , B1 inter-thread happens before A3.因此,由于 B1 在 B4 之前被排序,通过[intro.races]/9.3.2 ,B1 线程间发生在 A3 之前。 Therefore by [intro.races]/10.2 , B1 happens before A3.因此,通过[intro.races]/10.2 ,B1 发生在 A3 之前。 But then according to [intro.races]/16 , B1 must take its value from the pre-A3 state.但是根据[intro.races]/16 ,B1 必须从 A3 之前的状态中获取其值。 Therefore the load will return false, and B2 will never run in the first place.因此负载将返回 false,并且 B2 将永远不会运行。 In other words, this case can't happen.换句话说,这种情况不可能发生。

So if B3 executes at all (case 1), A2 happens before B3 and the assert will pass.因此,如果 B3 完全执行(情况 1),则 A2 发生在 B3 之前,并且断言将通过。

No memory operation inside a mutex protected region can 'escape' from that area.互斥保护区域内的任何内存操作都无法从该区域“逃脱”。 That applies to all memory operations, atomic and non-atomic.这适用于所有内存操作,原子和非原子。

In section 1.10.1:在第 1.10.1 节中:

a call that acquires a mutex will perform an acquire operation on the locations comprising the mutex Correspondingly, a call that releases the same mutex will perform a release operation on those same locations获取互斥锁的调用将对包含互斥锁的位置执行获取操作相应地,释放相同互斥锁的调用将对那些相同的位置执行释放操作

Furthermore, in section 1.10.1.6:此外,在第 1.10.1.6 节中:

All operations on a given mutex occur in a single total order.对给定互斥锁的所有操作都以单一的总顺序发生。 Each mutex acquisition “reads the value written” by the last mutex release.每次互斥量获取都会“读取上次互斥量释放所写入的值”。

And in 30.4.3.1而在 30.4.3.1

A mutex object facilitates protection against data races and allows safe synchronization of data between execution agents互斥对象有助于防止数据竞争,并允许执行代理之间的数据安全同步

This means, acquiring (locking) a mutex sets a one-way barrier that prevents operations that are sequenced after the acquire (inside the protected area) from moving up across the mutex lock.这意味着,获取(锁定)互斥锁设置了一个单向屏障,以防止在获取(在受保护区域内)之后排序的操作向上移动穿过互斥锁。

Releasing (unlocking) a mutex sets a one-way barrier that prevents operations that are sequenced before the release (inside the protected area) from moving down across the mutex unlock.释放(解锁)互斥锁设置了一个单向屏障,防止在释放之前(受保护区域内)排序的操作向下移动穿过互斥锁解锁。

In addition, memory operations that are released by a mutex are synchronized (visible) with another thread that acquires the same mutex.此外,由互斥锁释放的内存操作与获取相同互斥锁的另一个线程同步(可见)。

In your example, foo_has_been_set is checked in CheckFoo .. If it reads true you know that the value 1 has been assigned to foo by SetFoo , but it is not synchronized yet.在你的榜样, foo_has_been_set在检查CheckFoo 。如果它读取true ,你知道,值1已被分配给fooSetFoo ,但还没有与之同步。 The mutex lock that follows will acquire foo , synchronization is complete and the assert cannot fire.随后的互斥锁将获取foo ,同步完成并且断言无法触发。

The standard does not directly guarantee that, but you can read it between the lines of [thread.mutex.requirements.mutex].:该标准不直接保证,但您可以在 [thread.mutex.requirements.mutex]. 的行之间阅读它:

For purposes of determining the existence of a data race, these behave as atomic operations ([intro.multithread]).为了确定数据竞争的存在,这些行为表现为原子操作([intro.multithread])。
The lock and unlock operations on a single mutex shall appear to occur in a single total order.单个互斥锁上的锁定和解锁操作应出现在单个总顺序中。

Now the second sentence looks like a hard guarantee, but it really isn't.现在第二句话看起来像是一个硬保证,但事实并非如此。 Single total order is very nice, but it only means that there is a well-defined single total order of of acquiring and releasing one particular mutex .单个总顺序非常好,但这仅意味着获取和释放一个特定 mutex有一个明确定义的单个总顺序。 Alone by itself, that doesn't mean that the effects of any atomic operations, or related non-atomic operations should or must be globally visible at some particular point related to the mutex.就其本身而言,这并不意味着任何原子操作或相关非原子操作的效果应该或必须在与互斥锁相关的某个特定点全局可见。 Or, whatever.或者,随便。 The only thing that is guaranteed is about the order of code execution (specifically, the execution of a single pair of functions, lock and unlock ), nothing is being said about what may or may not happen with data, or otherwise.唯一能保证的是代码执行的顺序(特别是一对函数的执行, lockunlock ),没有说数据可能会或可能不会发生什么,或者其他什么。
One can, however, read between the lines that this is nevertheless the very intention from the "behave as atomic operations" part.然而,人们可以从字里行间看出,这正是“行为为原子操作”部分的意图。

From other places, it is also pretty clear that this is the exact idea and that an implementation is expected to work that way, without explicitly saying that it must .从其他地方,也很清楚这是确切的想法,并且实现应该以这种方式工作,而没有明确说明它必须 For example, [intro.races] reads:例如,[intro.races] 读作:

[ Note: For example, a call that acquires a mutex will perform an acquire operation on the locations comprising the mutex. [注意:例如,获取互斥锁的调用将对包含互斥锁的位置执行获取操作。 Correspondingly, a call that releases the same mutex will perform a release operation on those same locations.相应地,释放相同互斥锁的调用将在相同位置执行释放操作。

Note the unlucky little, harmless word "Note:" .请注意不幸的小而无害的词“注意:” Notes are not normative.注释不规范。 So, while it's clear that this is how it's intended to be understood (mutex lock = acquire; unlock = release), this is not actually a guarantee.所以,虽然很明显,这是它打算如何理解(互斥锁=获取;解锁=释放),这实际上不是一个保证。

I think the best, although non-straightforward guarantee comes from this sentence in [thread.mutex.requirements.general]:我认为最好的,虽然不直接的保证来自 [thread.mutex.requirements.general] 中的这句话:

A mutex object facilitates protection against data races and allows safe synchronization of data between execution agents.互斥对象有助于防止数据竞争,并允许执行代理之间的数据安全同步。

So that's what a mutex does (without saying how exactly).所以这就是互斥锁的作用(不说具体如何)。 It protects against data races.它可以防止数据竞争。 Fullstop.句号。

Thus, no matter what subtleties one comes up with and no matter what else is written or isn't explicitly said, using a mutex protects against data races (... of any kind, since no specific type is given).因此,无论人们想出什么微妙之处,也无论写了什么或没有明确表示,使用互斥锁可以防止数据竞争(......任何类型,因为没有给出特定类型)。 That's what is written.就是这么写的。 So, in conclusion, as long as you use a mutex, you are good to go even with relaxed ordering or no atomic ops at all.因此,总而言之,只要您使用互斥锁,即使排序松散或根本没有原子操作,您也很高兴。 Loads and stores (of any kind) cannot be moved around because then you couldn't be sure no data races occur.加载和存储(任何类型的)不能移动,因为那样你就不能确定没有数据竞争发生。 Which, however, is exactly what a mutex protects against.然而,这正是互斥锁所防止的。
Thus, without saying so, this says that a mutex must be a full barrier.因此,不用说,这说明互斥锁必须是完全屏障。

The answer seem to lie in http://eel.is/c++draft/intro.multithread#intro.races-3答案似乎在于http://eel.is/c++draft/intro.multithread#intro.races-3

The two pertinent parts are两个相关的部分是

[...] In addition, there are relaxed atomic operations, which are not synchronization operations [...] [...] 此外,还有宽松的原子操作,它们不是同步操作 [...]

and

[...] performing a release operation on A forces prior side effects on other memory locations to become visible to other threads that later perform a consume or an acquire operation on A. [...] [...] 对 A 执行释放操作会强制其他内存位置上的先前副作用对稍后对 A 执行消耗或获取操作的其他线程可见。 [...]

While relaxed orders atomics are not considered synchronization operations, that's all the standard has to say about them in this context.虽然宽松的订单原子不被视为同步操作,但在这种情况下,这就是关于它们的所有标准。 Since they are still memory locations, the general rule of them being governed by other synchronization operations still applies.由于它们仍然是内存位置,因此它们受其他同步操作控制的一般规则仍然适用。

So in conclusion, the standard does not seem to have anything specifically in there to prevent the reordering you described, but the wording as it stands would prevent it naturally.因此,总而言之,该标准似乎没有任何特别的内容来防止您描述的重新排序,但是目前的措辞会自然而然地阻止它。

Edit: Woops, I linked to the draft.编辑:糟糕,我链接到草案。 The C++11 paragraph covering this is 1.10-5, using the same language.涵盖此内容的 C++11 段落是 1.10-5,使用相同的语言。

CheckFoo() cannot cause the program to crash (ie trigger the assert() ) but there is also no guarantee the assert() will ever be executed. CheckFoo()不会导致程序崩溃(即触发assert() ),但也不能保证assert()会被执行。

If the condition at the start of CheckFoo() triggers (see below) the visible value of foo will be 1 because of the memory barriers and synchronization between mu.unlock() in SetFoo() and mu.lock() in CheckFoo() .如果CheckFoo()开始处的条件触发(见下文),则foo的可见值将为 1,因为mu.unlock()中的SetFoo()mu.lock()中的CheckFoo()之间mu.unlock()内存障碍和同步.

I believe that is covered by the description of mutex cited in other answers.我相信其他答案中引用的互斥锁描述涵盖了这一点。

However there is no guarantee that the if condition ( foo_has_been_set.load(std::memory_order_relaxed)) ) will ever be true.但是,不能保证 if 条件( foo_has_been_set.load(std::memory_order_relaxed)) )永远为真。 Relaxed memory order makes no guarantees and only the atomicity of the operation is assured.宽松的内存顺序不能保证,只保证操作的原子性。 Consequently in the absence of some other barrier there's no guarantee when the relaxed store in SetFoo() will be visible in CheckFoo() but if it is visible it will only be because the store was executed and then following the mu.lock() must be ordered after mu.unlock() and the writes before it visible.因此在没有其他一些障碍也不能保证当放宽店SetFoo()将在可见CheckFoo()但如果它是可见的也只会是因为被执行的商店,然后按照mu.lock()必须的在mu.unlock()之后mu.unlock()并且在它之前的写入可见。

Please note this argument relies on the fact that foo_has_been_set is only ever set from false to true .请注意,此参数依赖于foo_has_been_set仅从false设置为true If there were another function called UnsetFoo() that set it back to false:如果有另一个名为UnsetFoo()函数将其设置回 false:

void UnsetFoo() {
  mu.lock();
  foo = 0;
  foo_has_been_set.store(false, std::memory_order_relaxed);
  mu.unlock();
}

That was called from the other (or yet a third) thread then there's no guarantee that checking foo_has_been_set without synchronization will guarantee that foo is set.这是从另一个(或第三个)线程调用的,然后不能保证在没有同步的情况下检查foo_has_been_set将保证foo已设置。

To be clear (and assuming foo_has_been_set is never unset):要清楚(并假设foo_has_been_set永远不会取消设置):

void CheckFoo() {
  if (foo_has_been_set.load(std::memory_order_relaxed)) {
    assert(foo == 1); //<- All bets are off.  data-race UB
    mu.lock();
    assert(foo == 1); //Guaranteed to succeed.
    mu.unlock();
  }
}

In practice on any real platform on any long running application it is probably inevitable that the relax store will eventually become visible to the other thread.实际上,在任何长时间运行的应用程序的任何真实平台上,可能不可避免的是,relax 存储最终会被另一个线程看到。 But there is no formal guarantee regarding if or when that will happen unless other barriers exist to assure it.但是,除非存在其他障碍来保证,否则没有关于是否或何时会发生的正式保证。

Formal References:正式参考:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf

Refer to the notes at the end of p.13 and start of p.14 particularly notes 17 - 20. They are essentially assuring coherence of 'relaxed' operations.请参阅第 13 页末尾和第 14 页开头的注释,特别是注释 17 - 20。它们本质上是确保“轻松”操作的连贯性。 Their visibility is relaxed but the visibility that occurs will be coherent and use of the phrase 'happens before' is within the overall principle of program ordering and particularly acquire and release barriers of mutexes.它们的可见性是放松的,但发生的可见性将是连贯的,并且“发生在之前”这个短语的使用符合程序排序的总体原则,特别是获取和释放互斥锁的障碍。 Note 19 is particularly relevant:注释 19 特别相关:

The four preceding coherence requirements effectively disallow compiler reordering of atomic operations to a single object, even if both operations are relaxed loads.前面的四个一致性要求有效地禁止编译器将原子操作重新排序为单个对象,即使这两个操作都是宽松加载。 This effectively makes the cache coherence guarantee provided by most hardware available to C++ atomic operations.这有效地使大多数硬件提供的缓存一致性保证可用于 C++ 原子操作。

Reordering within the critical section is of course possible:临界区中重新排序当然是可能的:

void SetFoo() {
  mu.lock();
  // REORDERED:
  foo_has_been_set.store(true, std::memory_order_relaxed);
  PAUSE(); //imagine scheduler pause here 
  foo = 1;
  mu.unlock();
}

Now, the question is CheckFoo - can the read of foo_has_been_set fall into the lock?现在的问题是CheckFoo -能的读取foo_has_been_set落入锁? Normally a read like that can (things can fall into locks, just not out), but the lock should never be taken if the if is false, so it would be a strange ordering.通常这样的读取可以(事情可能会落入锁中,只是不会出),但是如果 if 为假,则永远不应该使用锁,所以这将是一个奇怪的顺序。 Does anything say "speculative locks" are not allowed?有没有说“投机锁”是不允许的? Or can the CPU speculate that the if is true before reading foo_has_been_set ?或者 CPU 能否在读取foo_has_been_set之前推测 if 为真?

void CheckFoo() {
    // REORDER???
    mu.lock();
    if (foo_has_been_set.load(std::memory_order_relaxed)) {
        assert(foo == 1);
    }
    mu.unlock();
}

That ordering is probably not OK, but only because of "logic order" not memory order.该排序可能不正确,只是因为“逻辑顺序”而不是内存顺序。 If the mu.lock() was inlined (and became some atomic ops) what stops them from being reordered?如果mu.lock()被内联(并成为一些原子操作)是什么阻止它们被重新排序?

I'm not too worried about your current code, but I worry about any real code that uses something like this.我不是太担心当前的代码,但我担心的是使用这样的任何真正的代码。 It is too close to wrong.这太接近错误了。

ie if the OP code was the real code, you would just change foo to atomic, and get rid of the rest.即,如果 OP 代码是真正的代码,您只需将 foo 更改为 atomic,然后去掉其余部分。 So the real code must be different.所以真正的代码一定是不同的。 More complicated?更复杂? ... ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM