简体   繁体   English

从多个线程更新的 volatile 变量 C++

[英]volatile variable updated from multiple threads C++

    volatile bool b;
 
    
    Thread1: //only reads b
    void f1() {
    while (1) {
       if (b) {do something};
       else { do something else};
    }
    }
    
    Thread2: 
    //only sets b to true if certain condition met
    // updated by thread2
    void f2() {
    while (1) {
       //some local condition evaluated - local_cond
       if (!b && (local_cond == true)) b = true;
        //some other work
    }
    }
    
    Thread3:
    //only sets b to false when it gets a message on a socket its listening to
    void f3() {
    while (1) {
        //select socket
        if (expected message came) b = false;
        //do some other work
    }
    }

If thread2 updates b first at time t and later thread3 updates b at time t+5:如果线程 2 首先在时间 t 更新 b,然后线程 3 在时间 t+5 更新 b:

will thread1 see the latest value "in time" whenever it is reading b? thread1 在读取 b 时会“及时”看到最新值吗?

for example: reads from t+delta to t+5+delta should read true and reads after t+5+delta should read false.例如:从 t+delta 到 t+5+delta 的读取应该是 true,在 t+5+delta 之后读取应该是 false。

delta is the time for the store of "b" into memory when one of threads 2 or 3 updated it delta 是线程 2 或 3 之一更新时将“b”存储到 memory 的时间

The effect of volatile keyword is principally two things (I avoid scientifically strict formulations here): volatile关键字的作用主要有两点(我在这里避免科学上严格的表述):

​1) Its accesses can't be cached or combined. ​1)它的访问不能被缓存或组合。 (UPD: on suggestion, I underline this is for caching in registers or another compiler-provided location, not the RAM cache in CPU.) For example, the following code: (UPD:根据建议,我强调这是用于缓存在寄存器或其他编译器提供的位置,而不是 CPU 中的 RAM 缓存。)例如,以下代码:

x = 1;
x = 2;

for a volatile x will never be combined into single x = 2 , whatever optimization level is required;对于 volatile x永远不会组合成单个x = 2 ,无论需要什么优化级别; but if x is not volatile, even low levels will likely cause this collapse into a single write.但如果x不是易失性的,即使是低电平也可能导致这种崩溃为单次写入。 The same for reads: each read operation will access the variable value without any attempt to cache it.读取也是如此:每次读取操作都将访问变量值,而不会尝试对其进行缓存。

​2) All volatile operations are relayed onto machine command layer in the same order between them (to underline, only between volatile operations), as they are defined in source code. ​2)所有易失性操作在它们之间以相同的顺序中继到机器命令层(下划线,仅在易失性操作之间),正如它们在源代码中定义的那样。

But this is not true for accesses between non-volatile and volatile memory.但对于非易失性和易失性 memory 之间的访问,情况并非如此。 For the following code:对于以下代码:

int *x;
volatile int *vy;
void foo()
{
  *x = 1;
  *vy = 101;
  *x = 2;
  *vy = 102;
}

gcc (9.4) with -O2 and clang (10.0) with -O produce something similar to: gcc (9.4) 和 -O2 和 clang (10.0) 和 -O 产生类似于:

        movq    x(%rip), %rax
        movq    vy(%rip), %rcx
        movl    $101, (%rcx)
        movl    $2, (%rax)
        movl    $102, (%rcx)
        retq

so one access to x is already gone, despite its presence between two volatile accesses.所以一个对x的访问已经消失了,尽管它存在于两个 volatile 访问之间。 If one need the first x = 1 to succeed before first write to vy , let him put an explicit barrier (since C11, atomic_signal_fence is the platform-independent mean for this).如果在第一次写入vy之前需要第一个x = 1成功,让他设置一个明确的障碍(从 C11 开始, atomic_signal_fence是平台无关的平均值)。


That was the common rule but without regarding multithread issues.这是通用规则,但不涉及多线程问题。 What happens here with multithreading?多线程在这里会发生什么?

Well, imagine as you declare that thread 2 writes true to b , so, this is writing of value 1 to single-byte location.好吧,想象一下,当您声明线程 2 向b写入true时,这就是将值 1 写入单字节位置。 But, this is ordinary write without any memory ordering requirements.但是,这是没有任何 memory 订购要求的普通写入。 What you provided with volatile is that compiler won't optimize it.您为volatile提供的是编译器不会对其进行优化。 But what for processor?但是处理器呢?

If this was a modern abstract processor, or one with relaxed rules, like ARM, I'd say nothing prevent it from postponing the real write for an indefinite time.如果这是一个现代抽象处理器,或者一个具有宽松规则的处理器,比如 ARM,我会说没有什么能阻止它无限期地推迟真正的写入。 (To clarify, "write" is exposing the operation to RAM-and-all-caches conglomerate.) It's fully up to processor's deliberation. (澄清一下,“写”是将操作暴露给 RAM 和所有缓存的联合体。)这完全取决于处理器的考虑。 Well, processors are designed to flush their stockpiling of pending writes as fast as possible.好吧,处理器的设计目的是尽可能快地刷新其待处理写入的库存。 But what affects real delay, you can't know: for example, it could "decide" to fill instruction cache with a few next lines, or flush another queued writings... lots of variants.但是什么会影响真正的延迟,你不知道:例如,它可以“决定”用几行下一行来填充指令缓存,或者刷新另一个排队的写入......很多变体。 The only thing we know it provides "best effort" to flush all queued operations, to avoid getting buried under previous results.我们唯一知道它提供了“尽最大努力”来刷新所有排队的操作,以避免被先前的结果所掩盖。 That's truly natural and nothing more.这真的很自然,仅此而已。

With x86, there is an additional factor.对于 x86,还有一个附加因素。 Nearly every memory write (and, I guess, this one as well) is "releasing" write in x86, so, all previous reads and writes shall be completed before this write.几乎每个 memory 写入(我猜也是这个)在 x86 中“释放”写入,因此,所有先前的读取和写入都应在此写入之前完成。 But, the gut fact is that the operations to complete are before this write.但是,直观的事实是要完成的操作是此写入之前完成的。 So when you write true to volatile b , you will be sure all previous operations have already got visible to other participants... but this one still could be postponed for a while... how long?因此,当您将true写入 volatile b时,您将确定所有先前的操作已经对其他参与者可见......但是这个仍然可以推迟一段时间......多久? Nanoseconds?纳秒? Microseconds?微秒? Any other write to memory will flush and so publish this write to b ... do you have writes in cycle iteration of thread 2?对 memory 的任何其他写入都将刷新,因此将此写入发布到b ...您在线程 2 的循环迭代中是否有写入?

The same affects thread 3. You can't be sure this b = false will be published to other CPUs when you need it.这同样会影响线程 3。您无法确定此b = false是否会在需要时发布到其他 CPU。 Delay is unpredictable.延迟是不可预测的。 The only thing is guaranteed, if this is not a realtime-aware hardware system, for an indefinite time, and the ISA rules and barriers provide ordering but not exact times.唯一可以保证的是,如果这不是一个实时感知的硬件系统,则可以无限期地保证,并且 ISA 规则和障碍提供了排序但不提供确切的时间。 And, x86 is definitely not for such a realtime.而且,x86 绝对不适合这样的实时。


Well, all this means you also need an explicit barrier after write which affects not only compiler, but CPU as well: barrier before previous write and following reads or writes.好吧,这一切意味着你还需要一个显式的写后屏障,它不仅会影响编译器,还会影响 CPU:前一次写入之前的屏障以及后续的读取或写入。 Among C/C++ means, full barrier satifies this - so you have to add std::atomic_thread_fence(std::memory_order_seq_cst) or use atomic variable (instead of plain volatile one) with the same memory order for write.在 C/C++ 方法中,完全屏障满足了这一点 - 因此您必须添加std::atomic_thread_fence(std::memory_order_seq_cst)或使用具有相同 memory 写入顺序的原子变量(而不是普通的 volatile 变量)。

And, all this still won't provide you with exact timings like you described ("t" and "t+5"), because the visible "timestamps" of the same operation can differ for different CPUs, (Well. this resembles Einstein's relativity a bit,) All you could say in this situation is that something is written into memory.而且,所有这些仍然不会像您描述的那样为您提供确切的时间(“t”和“t+5”),因为相同操作的可见“时间戳”对于不同的 CPU 可能不同,(嗯。这类似于爱因斯坦的有点相对论,)在这种情况下,你只能说有些东西被写入了 memory。 and typically (not always) the inter-CPU order is what you expected (but the ordering violation will punish you)通常(并非总是)CPU 间的顺序是您所期望的(但顺序违规会惩罚您)


But, I can't catch the general idea of what do you want to implement with this flag b .但是,我无法理解你想用这个标志b实现什么。 What do you want from it, what state should it reflect?你想从中得到什么,它应该反映什么 state? Let you return to the upper level task and reformulate.让你回到上层任务,重新制定。 Is this (I'm just guessing on coffee grounds) a geen light to do something, which is cancelled by an external order?这是(我只是在咖啡渣上猜测)做某事的绿灯,被外部订单取消了吗? If so, an internal permission ("we are ready") from the thread 2 shall not drop this cancellation.如果是这样,来自线程 2 的内部许可(“我们准备好了”)不应放弃此取消。 This can be done using different approaches, as:这可以使用不同的方法来完成,例如:

​1) Just separate flags and a mutex/spinlock around their set. ​1)只需在它们的集合周围分离标志和互斥锁/自旋锁。 Easy but a bit costly (or even substantially costly, I don't know your environment).简单但有点贵(甚至非常昂贵,我不知道你的环境)。

​​2) An atomically modified analog. ​​2)原子修饰的类似物。 For example, you can use a bitfield variable which is modified using compare-and-swap.例如,您可以使用通过比较和交换修改的位域变量。 Assign bit 0 to "ready" but bit 1 for "cancelled".将位 0 ​​分配给“就绪”,但将位 1 分配给“已取消”。 For C, atomic_compare_exchange_strong is what you'll need here at x86 (and at most other ISAs).对于 C, atomic_compare_exchange_strong是您在 x86(以及最多其他 ISA)中需要的。 And, volatile is not needed anymore here if you keep residing with memory_order_seq_cst.而且,如果您继续使用 memory_order_seq_cst,则此处不再需要volatile

Will thread1 see the latest value "in time" whenever it is reading b? thread1 在读取 b 时会“及时”看到最新值吗?

Yes, the volatile keyword denotes that it can be modified outside of the thread or hardware without the compiler being aware thus every access (both read and write) will be made through an lvalue expression of volatile-qualified type is considered an observable side effect for the purpose of optimization and is evaluated strictly according to the rules of the abstract machine (that is, all writes are completed at some time before the next sequence point).是的,volatile 关键字表示它可以在编译器不知道的情况下在线程或硬件之外进行修改,因此每次访问(读取和写入)都将通过 volatile 限定类型的左值表达式进行,这被认为是可观察到的副作用优化的目的,并严格按照抽象机的规则进行评估(即所有写入都在下一个序列点之前的某个时间完成)。 This means that within a single thread of execution, a volatile access cannot be optimized out or reordered relative to another visible side effect that is separated by a sequence point from the volatile access.这意味着在单个执行线程中,相对于由序列点与 volatile 访问分隔的另一个可见副作用,无法优化或重新排序 volatile 访问。

Unfortunately, the volatile keyword is not thread-safe and operation will have to be taken with care, it is recommended to use atomic for this, unless in an embedded or bare-metal scenario.不幸的是, volatile 关键字不是线程安全的,必须小心操作,建议为此使用 atomic ,除非在嵌入式或裸机场景中。

Also the whole struct should be atomic struct X {int a; volatile bool b;};整个结构也应该是原子struct X {int a; volatile bool b;}; struct X {int a; volatile bool b;}; . .

Say I have a system with 2 cores.假设我有一个有 2 个内核的系统。 The first core runs thread 2, the second core runs thread 3.第一个核心运行线程 2,第二个核心运行线程 3。

reads from t+delta to t+5+delta should read true and reads after t+5+delta should read false.从 t+delta 读取到 t+5+delta 应为 true,在 t+5+delta 之后读取应为 false。

Problem is that thread 1 will read at t + 10000000 when the kernel decides one of the threads has run long enough and schedules a different thread.问题是线程 1 将在t + 10000000处读取,当内核决定其中一个线程已经运行足够长的时间并调度另一个线程时。 So it likely thread1 will not see the change a lot of the time.所以很可能 thread1 很多时候都看不到变化。

Note: this ignores all the additional problems of synchronicity of caches and observability.注意:这忽略了缓存同步性和可观察性的所有其他问题。 If the thread isn't even running all of that becomes irrelevant.如果线程甚至没有运行,那么所有这些都变得无关紧要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM