GCC之类的编译器如何为std :: mutex实现获取/释放语义

Question

My understanding is that std::mutex lock and unlock have a acquire/release semantics which will prevent instructions between them from being moved outside. 我的理解是std :: mutex锁定和解锁具有获取/释放语义，这将防止它们之间的指令被移到外面。

So acquire/release should disable both compiler and CPU reorder instructions. 因此，获取/释放应禁用编译器和CPU重新排序指令。

My question is that I take a look at GCC5.1 code base and don't see anything special in std::mutex::lock/unlock to prevent compiler reordering codes. 我的问题是我看一下GCC5.1代码库，并没有看到std :: mutex :: lock / unlock中的任何特殊内容，以防止编译器重新排序代码。

I find a potential answer in does-pthread-mutex-lock-have-happens-before-semantics which indicates a mail that says a external function call act as compiler memory fences. 我在do-pthread-mutex-lock-have-happen-before-semantics中找到了一个潜在的答案，它表示一个外部函数调用充当编译器内存栅栏的邮件。

Is it always true? 它总是如此吗？ And where is the standard? 标准在哪里？

Answer 1

Threads are a fairly complicated, low-level feature. 线程是一个相当复杂的低级功能。 Historically, there was no standard C thread functionality, and instead it was done differently on different OS's. 从历史上看，没有标准的C线程功能，而是在不同的操作系统上以不同的方式完成。 Today there is mainly the POSIX threads standard, which has been implemented in Linux and BSD, and now by extension OS X, and there are Windows threads, starting with Win32 and on. 今天主要有POSIX线程标准，已在Linux和BSD中实现，现在通过扩展OS X，并且有Windows线程，从Win32开始。 Potentially, there could be other systems besides these. 除了这些之外，可能还有其他系统。

GCC doesn't directly contain a POSIX threads implementation, instead it may be a client of libpthread on a linux system. GCC不直接包含POSIX线程实现，而是可能是linux系统上libpthread的客户端。 When you build GCC from source, you have to configure and build separately a number of ancillary libraries, supporting things like big numbers and threads. 当您从源代码构建GCC时，您必须单独配置和构建许多辅助库，支持大数字和线程等内容。 That is the point at which you select how threading will be done. 这就是您选择如何完成线程的点。 If you do it the standard way on linux, you will have an implementation of std::thread in terms of pthreads. 如果你在linux上采用标准方式，那么就pthreads而言，你将拥有std::thread的实现。

On windows, starting with MSVC C++11 compliance, the MSVC devs implemented std::thread in terms of the native windows threads interface. 在Windows上，从MSVC C ++ 11合规性开始，MSVC开发人员在本机Windows线程接口方面实现了std::thread 。

It's the OS's job to ensure that the concurrency locks provided by their API actually works -- std::thread is meant to be a cross-platform interface to such a primitive. 操作系统的工作是确保其API提供的并发锁实际工作 - std::thread意味着是这种原语的跨平台接口。

The situation may be more complicated for more exotic platforms / cross-compiling etc. For instance, in MinGW project (gcc for windows) -- historically, you have the option to build MinGW gcc using either a port of pthreads to windows, or using a native win32 based threading model. 对于更奇特的平台/交叉编译等情况可能更复杂。例如，在MinGW项目（gcc for windows）中 - 历史上，您可以选择使用pthreads到Windows的端口来构建MinGW gcc，或者使用基于win32的本机线程模型。 If you don't configure this when you build, you may end up with a C++11 compiler which doesn't support std::thread or std::mutex . 如果在构建时没有配置它，最终可能会得到一个不支持std::thread或std::mutex的C ++ 11编译器。 See this question for more details. 有关详细信息，请参阅此问题。 MinGW error: 'thread' is not a member of 'std' MinGW错误：'thread'不是'std'的成员

Now, to answer your question more directly. 现在，更直接地回答你的问题。 When a mutex is engaged, at the lowest level, this involves some call into libpthreads or some win32 API. 当互斥锁处于最低级别时，这涉及对libpthreads或某些win32 API的调用。

pthread_lock_mutex();
do_some_stuff();
pthread_unlock_mutex();

(The pthread_lock_mutex and pthread_unlock_mutex correspond to the implementations of lock and unlock of std::mutex on your platform, and in idiomatic C++11 code, these are in turn called in the ctor and dtor of std::unique_lock for instance if you are using that.) （ pthread_lock_mutex和pthread_unlock_mutex对应于平台上std::mutex的lock和unlock的实现，而在惯用的C ++ 11代码中，这些代码又在std::unique_lock的ctor和dtor中std::unique_lock ，例如你正在使用它。）

Generally, the optimizer cannot reorder these unless it is sure that pthread_lock_mutex() has no side-effects that can change the observable behavior of do_some_stuff() . 通常，优化器不能重新排序这些，除非确定pthread_lock_mutex()没有可能改变do_some_stuff()的可观察行为的do_some_stuff() 。

To my knowledge, the mechanism the compiler has for doing this is ultimately the same as what it uses for estimating the potential side-effects of calls to any other external library. 据我所知，编译器执行此操作的机制最终与用于估计调用任何其他外部库的潜在副作用的机制相同。

If there is some resource 如果有一些资源

int resource;

which is in contention among various threads, it means that there is some function body 这是各种线程之间的争论，它意味着有一些功能体

void compete_for_resource();

and a function pointer to this is at some earlier point passed to pthread_create... in your program in order to initiate another thread. 并且一个函数指针指向这是在程序中传递给pthread_create...某个早期点，以便启动另一个线程。 (This would presumably be in the implementation of the ctor of std::thread .) At this point, the compiler can see that any call into libpthread can potentially call compete_for_resource and touch any memory that that function touches. （这可能是在std::thread的ctor的实现中。）此时，编译器可以看到对libpthread任何调用都可能调用compete_for_resource并触摸该函数触及的任何内存。 (From the compiler's point of view libpthread is a black box -- it is some .dll / .so and it can't make assumptions about what exactly it does.) （从编译器的角度来看， libpthread是一个黑盒子 - 它是一些.dll / .so而且它不能对它到底做什么做出假设。）

In particular, the call pthread_lock_mutex(); 特别是，调用pthread_lock_mutex(); potentially has side-effects for resource , so it cannot be re-ordered against do_some_stuff() . 可能对resource有副作用，因此无法对do_some_stuff()重新排序。

If you never actually spawn any other threads, then to my knowledge, do_some_stuff(); 如果你从未真正产生任何其他线程，那么据我所知， do_some_stuff(); could be reordered outside of the mutex lock. 可以在互斥锁之外重新排序。 Since, then libpthread doesn't have any access to resource , it's just a private variable in your source and isn't shared with the external library even indirectly, and the compiler can see that. 因为，然后libpthread没有对resource任何访问权限，它只是源代码中的私有变量，即使是间接也不与外部库共享，编译器可以看到。

Answer 2

All of these questions stem from the rules for compiler reordering. 所有这些问题都源于编译器重新排序的规则。 One of the fundamental rules for reordering is that the compiler must prove that the reorder does not change the result of the program. 重新排序的基本规则之一是编译器必须证明重新排序不会改变程序的结果。 In the case of std::mutex , the exact meaning of that phrase is specified in a block of about 10 pages of legaleese, but the general intuitive sense of "doesn't change the result of the program" holds. 在std::mutex的情况下，该短语的确切含义是在大约10 页 legaleese的块中指定的，但是一般直观的“不改变程序的结果”成立。 If you had a guarantee about which operation came first, according to the specification, no compiler is allowed to reorder in a way which violates that guarantee. 如果您对根据规范首先执行的操作有保证，则不允许编译器以违反该保证的方式重新排序。

This is why people often claim that a "function call acts as a memory barrier." 这就是人们经常声称“函数调用充当内存障碍”的原因。 If the compiler cannot deep-inspect the function, it cannot prove that the function didn't have a hidden barrier or atomic operation inside of it, thus it must treat that function as though it was a barrier. 如果编译器无法深入检查函数，则无法证明函数内部没有隐藏屏障或原子操作，因此它必须将该函数视为障碍。

There is, of course, the case where the compiler can inspect the function, such as the case of inline functions or link time optimizations. 当然，编译器可以检查函数的情况，例如内联函数或链接时间优化的情况。 In these cases, one cannot rely on a function call to act as a barrier, because the compiler may indeed have enough information to prove the rewrite behaves the same as the original. 在这些情况下，人们不能依赖函数调用来充当障碍，因为编译器可能确实有足够的信息来证明重写行为与原始行为相同。

In the case of mutexes, even such advanced optimization cannot take place. 在互斥体的情况下，甚至不能进行这种高级优化。 The only way to reorder around the mutex lock/unlock function calls is to have deep-inspected the functions and proven there are no barriers or atomic operations to deal with. 重新排序互斥锁定/解锁函数调用的唯一方法是深入检查函数，并证明没有障碍或原子操作要处理。 If it can't inspect every sub-call and sub-sub-call of that lock/unlock function, it can't prove it is safe to reorder. 如果它无法检查该锁定/解锁功能的每个子呼叫和子子呼叫，则无法证明重新排序是安全的。 If it indeed can do this inspection, it would see that every mutex implementation contains something which cannot be reordered around (indeed, this is part of the definition of a valid mutex implementation). 如果确实可以进行此检查，则会看到每个互斥锁实现都包含无法重新排序的内容（实际上，这是有效互斥实现的定义的一部分）。 Thus, even in that extreme case, the compiler is still forbidden from optimizing. 因此，即使在极端情况下，仍然禁止编译器进行优化。

EDIT : For completeness, I would like to point out that these rules were introduced in C++11. 编辑：为了完整起见，我想指出这些规则是在C ++ 11中引入的。 C++98 and C++03 reordering rules only prohibited changes that affected the result of the current thread . C ++ 98和C ++ 03重新排序规则仅禁止影响当前线程结果的更改。 Such a guarantee is not strong enough to develop multithreading primitives like mutexes. 这种保证不足以开发多线程原语，如互斥体。

To deal with this, multithreading APIs like pthreads developed their own rules. 为了解决这个问题，像pthreads这样的多线程API开发了自己的规则。 from the Pthreads specification section 4.11 : 来自Pthreads规范部分4.11 ：

Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. 应用程序应确保限制多个控制线程（线程或进程）对任何内存位置的访问，以便没有控制线程可以读取或修改内存位置，而另一个控制线程可能正在修改它。 Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads. 使用同步线程执行的函数以及相对于其他线程同步存储器来限制这种访问。 The following functions synchronize memory with respect to other threads 以下函数使内存与其他线程同步

It then lists a few dozen functions which synchronize memory, including pthread_mutex_lock and pthread_mutex_unlock . 然后列出了几十个同步内存的函数，包括pthread_mutex_lock和pthread_mutex_unlock 。

A compiler which wishes to support the pthreads library must implement something to support this cross-thread memory synchronization, even though the C++ specification didn't say anything about it. 希望支持pthreads库的编译器必须实现某些东西来支持这种跨线程内存同步，即使C ++规范没有说明任何内容。 Fortunately, any compiler where you want to do multithreading was developed with the recognition that such guarantees are fundamental to all multithreading, so every compiler that supports multithreading has it! 幸运的是，任何你想要进行多线程处理的编译器都是在认识到这种保证是所有多线程的基础的基础上开发的，所以每个支持多线程的编译器都有它！

In the case of gcc, it did so without any special notes on the pthreads function calls because gcc would effectively create a barrier around every external function call (because it couldn't prove that no synchronization existed inside that function call). 在gcc的情况下，它没有任何关于pthreads函数调用的特殊注释，因为gcc会有效地在每个外部函数调用周围创建一个屏障（因为它无法证明该函数调用中不存在同步）。 If gcc were to ever change that, they would also have to change their pthreads headers to include any extra verbage needed to mark the pthreads functions as synchronizing memory. 如果gcc要改变它，他们还必须改变他们的pthreads头，以包括将pthreads函数标记为同步内存所需的任何额外的verbage。

All of that, of course, is compiler specific. 当然，所有这些都是特定于编译器的。 There were no standards answers to this question until C++11 came along with its new memory model. 在C ++ 11推出新的内存模型之前，这个问题没有标准答案。

Answer 3

NOTE: I am no expert in this area and my knowledge about it is in a spaghetti like condition. 注意：我不是这方面的专家，而且我对它的了解是在意大利面条状。 So take the answer with a grain of salt. 所以拿一粒盐给出答案。

NOTE-2: This might not be the answer that OP is expecting. 注2：这可能不是OP期望的答案。 But here are my 2 cents anyways if it helps: 但如果它有帮助，这是我的2美分：

My question is that I take a look at GCC5.1 code base and don't see anything special in std::mutex::lock/unlock to prevent compiler reordering codes. 我的问题是我看一下GCC5.1代码库，并没有看到std :: mutex :: lock / unlock中的任何特殊内容，以防止编译器重新排序代码。

g++ using pthread library. g ++使用pthread库。 std::mutex is just a thin wrapper around pthread_mutex . std :: mutex只是pthread_mutex一个薄包装器。 So, you will have to actually go and have a look at pthread's mutex implementation. 所以，你必须真正去看看pthread的互斥实现。
If you go bit deeper into the pthread implementation (which you can find here ), you will see that it uses atomic instructions along with futex calls. 如果你深入研究pthread实现（你可以在这里找到），你会看到它使用原子指令和futex调用。

Two minor things to remember here: 这里要记住两件小事：
1. The atomic instructions do use barriers. 原子指令确实使用障碍。
2. Any function call is equivalent to full barrier. 2.任何函数调用都相当于完全屏障。 Do not remember from where I read it. 不记得从哪里读到它。
3. mutex calls may put the thread to sleep and cause context switch. 3. mutex调用可能会使线程进入休眠状态并导致上下文切换。

Now, as far as reordering goes, one of the things that needs to be guaranteed is that, no instruction after lock and before unlock should be reordered to before lock or after unlock . 现在，就重新排序而言，需要保证的一点是， lock之后和unlock之前的任何指令都不应该在lock之前或unlock之后重新排序。 This I believe is not a full-barrier, but rather just acquire and release barrier respectively. 我认为这不是一个完全障碍，而是分别只是获得和释放障碍。 But, this is again platform dependent, x86 provides sequential consistency by default whereas ARM provides a weaker ordering guarantee. 但是，这又是依赖于平台的，x86默认提供顺序一致性，而ARM则提供较弱的排序保证。

I strongly recommend this blog series: http://preshing.com/archives/ It explains lots of lower level stuff in easy to understand language. 我强烈推荐这个博客系列： http ： //preshing.com/archives/它以易于理解的语言解释了许多低级的东西。 Guess, I have to read it once again :) 猜猜，我必须再次阅读:)

UPDATE:: Unable to comment on @Cort Ammons answer due to length 更新::由于长度，无法评论@Cort Ammons的答案

@Kane I am not sure about this, but people in general write barriers for processor level which takes care of compiler level barriers as well. @Kane我不确定这一点，但一般人都会为处理器级别设置障碍，这也会影响编译器级别的障碍。 The same is not true for compiler builtin barriers. 编译器内置障碍也是如此。

Now, since the pthread_*lock* functions definitions are not present in the translation unit where you are making use of it (this is doubtful), calling lock - unlock should provide you with full memory barrier. 现在，由于pthread_*lock*函数定义不存在于您正在使用它的翻译单元中（这是值得怀疑的），调用lock - unlock应该为您提供完整的内存屏障。 The pthread implementation for the platform makes use of atomic instructions to block any other thread from accessing the memory locations after the lock or before unlock. 该平台的pthread实现利用原子指令阻止任何其他线程在锁定之后或解锁之前访问存储器位置。 Now since only one thread is executing the critical portion of the code it is ensured that any reordering within that will not change the expected behaviour as mentioned in above comment. 现在，因为只有一个线程正在执行代码的关键部分，所以确保其中的任何重新排序都不会改变上述注释中提到的预期行为。

Atomics is pretty tough to understand and to get right, so, what I have written above is from my understanding. 原子是很难理解和正确的，所以，我上面写的是我的理解。 Would be very glad to know if my understanding is wrong here. 我很高兴知道我的理解是不是错了。

Answer 4

So acquire/release should disable both compiler and CPU reorder instructions. 因此，获取/释放应禁用编译器和CPU重新排序指令。

By definition anything that prevents CPU reordering by speculative execution prevents compiler reordering. 根据定义，任何阻止CPU通过推测执行重新排序的东西都会阻止编译器重新排序。 That's the definition of language semantics, even without MT (multi-threading) in the language, so you will be safe from reordering on old compilers that don't support MT. 这就是语言语义的定义，即使没有语言中的MT（多线程），因此您可以安全地重新排序不支持MT的旧编译器。

But these compilers aren't safe for MT for a bunch of reasons, from the lack of thread protection around runtime initialization of static variables to the implicitly modified global variables like errno, etc. 但是由于缺乏围绕静态变量的运行时初始化的线程保护到隐式修改的全局变量（如errno等），这些编译器对MT来说并不安全。

Also, in C/C++, any call to a function that is purely external (that is: not inline, available for inlining at any point), without annotation explaining what it does (like the "pure function" attribute of some popular compiler), must be assumed to do anything that legal C/C++ code can do. 此外，在C / C ++中，对纯粹外部的函数的任何调用（即：不是内联的，可用于任何点的内联），没有注释解释它的作用（如某些流行编译器的“纯函数”属性），必须假设做任何合法的C / C ++代码可以做的事情。 No non trivial reordering would be possible (any reordering that is visible is non trivial). 不可能进行非平凡的重新排序（任何可见的重新排序都是非常重要的）。

Any correct implementation of locks on systems with multiple units of execution that don't simulate a global order on assembly instructions will require memory barriers and will prevent reordering. 在没有模拟汇编指令的全局顺序的多个执行单元的系统上的任何正确的锁实现都需要内存屏障并且将阻止重新排序。

An implementation of locks on a linearly executing CPU, with only one unit of execution (or where all threads are bound on the same unit of execution), might use only volatile variables for synchronisation and that is unsafe as volatile reads resp. 线性执行CPU上的锁的实现，只有一个执行单元（或者所有线程绑定在同一执行单元上），可能只使用volatile变量进行同步，并且因为volatile读取而不安全。 writes do not provide any guarantee of acquire resp. 写不提供任何获得的保证。 release of any other data (contrast Java). 发布任何其他数据（对比Java）。 Some kind of compiler barrier would be needed, like a strongly external function call, or some asm (""/*nothing*/) (which is compiler specific and even compiler version specific). 需要某种编译器屏障，如强外部函数调用，或某些asm (""/*nothing*/) （这是编译器特定的，甚至是编译器版本特定的）。

GCC之类的编译器如何为std :: mutex实现获取/释放语义

问题描述

4 个解决方案

解决方案1
14 2016-06-07 16:04:14

解决方案2
3 已采纳 2016-06-07 21:15:36

解决方案3
2 2016-06-07 17:28:09

解决方案4
0 2018-06-02 17:23:58

GCC之类的编译器如何为std :: mutex实现获取/释放语义

问题描述

4 个解决方案

解决方案1 14 2016-06-07 16:04:14

解决方案2 3 已采纳 2016-06-07 21:15:36

解决方案3 2 2016-06-07 17:28:09

解决方案4 0 2018-06-02 17:23:58

解决方案1
14 2016-06-07 16:04:14

解决方案2
3 已采纳 2016-06-07 21:15:36

解决方案3
2 2016-06-07 17:28:09

解决方案4
0 2018-06-02 17:23:58