简体   繁体   English

何时只有编译器的内存屏障(例如std :: atomic_signal_fence)有用吗?

[英]When is a compiler-only memory barrier (such as std::atomic_signal_fence) useful?

The notion of a compiler fence often comes up when I'm reading about memory models, barriers, ordering, atomics, etc., but normally it's in the context of also being paired with a CPU fence , as one would expect. 当我正在阅读有关内存模型,障碍,排序,原子等等时, 编译器栅栏的概念经常会出现,但通常情况下,它也会CPU栅栏配对,正如人们所期望的那样。

Occasionally, however, I read about fence constructs which only apply to the compiler. 但是,偶尔我会读到适用于编译器的fence构造。 An example of this is the C++11 std::atomic_signal_fence function, which states at cppreference.com : 一个例子是C ++ 11 std::atomic_signal_fence函数,该函数在cppreference.com上声明

std::atomic_signal_fence is equivalent to std::atomic_thread_fence, except no CPU instructions for memory ordering are issued. std :: atomic_signal_fence等效于std :: atomic_thread_fence,除了没有发出内存排序的CPU指令。 Only reordering of the instructions by the compiler is suppressed as order instructs. 仅按顺序指示抑制编译器对指令的重新排序。

I have five questions related to this topic: 我有五个与此主题相关的问题:

  1. As implied by the name std::atomic_signal_fence , is an asynchronous interrupt (such as a thread being preempted by the kernel to execute a signal handler) the only case in which a compiler-only fence is useful? 正如名称std::atomic_signal_fence所暗示的std::atomic_signal_fence ,是一个异步中断 (例如一个被内核抢占以执行信号处理程序的线程) 唯一一种只有编译器的栅栏才有用的情况?

  2. Does its usefulness apply to all architectures, including strongly-ordered ones such as x86 ? 它的用处是否适用于所有体系结构,包括x86强有序的体系结构?

  3. Can a specific example be provided to demonstrate the usefulness of a compiler-only fence? 是否可以提供一个特定的示例来演示仅编译器栅栏的用途?

  4. When using std::atomic_signal_fence , is there any difference between using acq_rel and seq_cst ordering? 使用std::atomic_signal_fence ,使用acq_relseq_cst排序有什么区别吗? (I would expect it to make no difference.) (我希望它没有任何区别。)

  5. This question might be covered by the first question, but I'm curious enough to ask specifically about it anyway: Is it ever necessary to use fences with thread_local accesses? 这个问题可能是由第一个问题所覆盖,但我足够的好奇,一下也无妨具体问:是否曾经需要使用围栏与thread_local访问? (If it ever would be, I would expect compiler-only fences such as atomic_signal_fence to be the tool of choice.) (如果有的话,我希望只有编译器的围栏,比如atomic_signal_fence才能成为首选工具。)

Thank you. 谢谢。

To answer all 5 questions: 要回答所有5个问题:


1) A compiler fence ( by itself, without a CPU fence ) is only useful in two situations: 1)编译器栏( 本身没有CPU围栏 )仅在两种情况下有用:

  • To enforce memory order constraints between a single thread and asynchronous interrupt handler bound to that same thread (such as a signal handler). 在单个线程和绑定到同一线程的异步中断处理程序 (例如信号处理程序) 之间强制执行内存顺序约束

  • To enforce memory order constraints between multiple threads when it is guaranteed that every thread will execute on the same CPU core . 保证每个线程将在同一CPU核心上执行时,在多个线程之间强制执行内存顺序约束 In other words, the application will only run on single core systems, or the application takes special measures (through processor affinity ) to ensure that every thread which shares the data is bound to the same core. 换句话说,应用程序将仅在单核系统上运行,或者应用程序采取特殊措施(通过处理器关联性 )来确保共享数据的每个线程都绑定到同一个核心。


2) The memory model of the underlying architecture, whether it's strongly- or weakly-ordered, has no bearing on whether a compiler-fence is needed in a situation. 2)底层架构的内存模型,无论是强排序还是弱排序,都与在某种情况下是否需要编译器栅栏无关。


3) Here is pseudo-code which demonstrates the use of a compiler fence, by itself, to sufficiently synchronize memory access between a thread and an async signal handler bound to the same thread: 3)这里是伪代码 ,演示了如何使用编译器围栏来充分同步线程和绑定到同一线程的异步信号处理程序之间的内存访问:

void async_signal_handler()
{
    if ( is_shared_data_initialized )
    {
        compiler_only_memory_barrier(memory_order::acquire);
        ... use shared_data ...
    }
}

void main()
{
// initialize shared_data ...
    shared_data->foo = ...
    shared_data->bar = ...
    shared_data->baz = ...
// shared_data is now fully initialized and ready to use
    compiler_only_memory_barrier(memory_order::release);
    is_shared_data_initialized = true;
}

Important Note: This example assumes that async_signal_handler is bound to the same thread that initializes shared_data and sets the is_initialized flag, which means the application is single-threaded, or it sets thread signal masks accordingly. 重要说明:此示例假定async_signal_handler绑定到初始化shared_data的同一线程并设置is_initialized标志,这意味着应用程序是单线程的,或者它相应地设置线程信号掩码。 Otherwise, the compiler fence would be insufficient, and a CPU fence would also be needed. 否则,编译器栅栏将不足,并且还需要CPU栅栏


4) They should be the same. 4) 他们应该是一样的。 acq_rel and seq_cst should both result in a full (bidirectional) compiler fence, with no fence-related CPU instructions emitted. acq_relseq_cst都应该生成一个完整的(双向)编译器栅栏,不会发出与栅栏相关的CPU指令。 The concept of "sequential consistency" only comes into play when multiple cores and threads are involved, and atomic_signal_fence only pertains to one thread of execution. “顺序一致性”的概念仅在涉及多个核心和线程atomic_signal_fence ,而atomic_signal_fence仅涉及一个执行线程。


5) No. (Unless of course, the thread-local data is accessed from an asynchronous signal handler in which case a compiler fence might be necessary.) Otherwise, fences should never be needed with thread-local data since the compiler (and CPU) are only allowed to reorder memory accesses in ways that do not change the observable behavior of the program with respect to its sequence points from a single-threaded perspective. 5) 否。 (当然,除非从异步信号处理程序访问线程本地数据,否则可能需要编译器栏。)否则,自编译器(和CPU)以来,永远不需要使用线程局部数据的围栏)只被允许重新排序存储器访问在不相对于其改变程序的可观察行为的方式顺序点从单线程透视。 And one can logically think of thread-local statics in a multi-threaded program to be the same as global statics in a single-threaded program. 从逻辑上讲,可以将多线程程序中的线程局部静态视为与单线程程序中的全局静态相同。 In both cases, the data is only accessible from a single thread, which prevents a data race from occuring. 在这两种情况下,只能从单个线程访问数据,这可以防止发生数据争用。

There are actually some nonportable but useful C programming idioms where compiler fences are useful, even in multicore code (particularly in pre-C11 code). 实际上有一些非便携但有用的C编程习惯用法,即使在多核代码中(特别是在C11之前的代码中),编译器围栏也很有用。 The typical situation is where the program is doing some accesses that would normally be made volatile (because they are to shared variables), but you want the compiler to be able to move the accesses around. 典型情况是程序正在进行一些通常会变为易失性的访问(因为它们是共享变量),但是您希望编译器能够移动访问。 If you know that the accesses are atomic on the target platform (and you take some other precautions), you can leave the accesses nonvolatile, but contain code movement using compiler barriers. 如果您知道访问在目标平台上是原子的(并且您采取了一些其他预防措施),则可以使访问保持非易失性,但包含使用编译器障碍的代码移动。

Thankfully, most programming like this is made obsolete with C11/C++11 relaxed atomics. 值得庆幸的是,大多数这样的编程都被C11 / C ++ 11轻松原子所淘汰。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM