在多核CPU上读取32位的原子性

Question

(Note: I've added tags to this question based on where I feel will people will be who are likely to be able to help, so please don't shout:)) （注意：我已经根据我认为人们可能会提供帮助的地方添加了这个问题的标签，所以请不要大喊:)）

In my VS 2017 64bit project, I have a 32bit long value m_lClosed . 在我的VS 2017 64bit项目中，我有一个32位长的值m_lClosed 。 When I want to update this, I use one of the Interlocked family of functions. 当我想更新它时，我使用Interlocked系列函数之一。

Consider this code, executing on thread #1 考虑这个代码，在线程＃1上执行

LONG lRet = InterlockedCompareExchange(&m_lClosed, 1, 0);   // Set m_lClosed to 1 provided it's currently 0

Now consider this code, executing on thread #2: 现在考虑这个代码，在线程＃2上执行：

if (m_lClosed) // Do something

I understand that on a single CPU, this will not be a problem because the update is atomic and the read is atomic too (see MSDN ), so thread pre-emption cannot leave the variable in a partially updated state. 我理解在单个CPU上，这不会是一个问题，因为更新是原子的，读取也是原子的（参见MSDN ），因此线程抢占不能使变量处于部分更新状态。 But on a multicore CPU, we really could have both these pieces of code executing in parallel if each thread is on a different CPU. 但是在多核CPU上，如果每个线程都在不同的CPU上，我们真的可以让这两段代码并行执行。 In this example, I don't think that would be a problem, but it still feels wrong to be testing something that is in the process of possibly being updated. 在这个例子中，我认为这不会是一个问题，但是在测试可能正在更新的过程中仍然感觉不对。

This webpage tells me that atomicity on multiple CPUs is achieved via the LOCK assembly instruction, preventing other CPUs from accessing that memory. 该网页告诉我，多个CPU的原子性是通过LOCK汇编指令实现的，防止其他CPU访问该内存。 That sounds like what I need, but the assembly language generated for the if test above is merely 这听起来像我需要的，但上面为if测试生成的汇编语言仅仅是

cmp   dword ptr [l],0

... no LOCK instruction in sight. ......看不到LOCK指令。

How in an event like this are we supposed to ensure atomicity of the read? 在这样的事件中，我们应该如何确保读取的原子性？

EDIT 24/4/18 编辑24/4/18

Firstly thanks for all the interest this question has generated. 首先感谢这个问题产生的所有兴趣。 I show below the actual code; 我在下面显示实际代码; I purposely kept it simple to focus on the atomicity of it all, but clearly it would have been better if I had showed it all from minute one. 我故意把它简单地集中在它的所有原子性上，但显然如果我从一分钟那里展示它就会更好。

Secondly, the project in which the actual code lives is a VS2005 project; 其次，实际代码所在的项目是VS2005项目; hence no access to C++11 atomics . 因此无法访问C ++ 11原子 。 That's why I didn't add the C++11 tag to the question. 这就是我没有在问题中添加C ++ 11标签的原因。 I am using VS2017 with a "scratch" project to save having to build the huge VS2005 one every time I make a change whilst I am learning. 我正在使用VS2017进行“刮擦”项目，以便在我学习的时候每次做出改变时都要建立一个巨大的VS2005。 Plus, its a better IDE. 另外，它是一个更好的IDE。

Right, so the actual code lives in an IOCP driven server, and this whole atomicity is about handling a closed socket: 是的，所以实际代码存在于IOCP驱动的服务器中，这整个原子性是关于处理一个封闭的套接字：

class CConnection
{
    //...

    DWORD PostWSARecv()
    {
        if (!m_lClosed)
            return ::WSARecv(...);
        else
            return WSAESHUTDOWN;
    }

    bool SetClosed()
    {
        LONG lRet = InterlockedCompareExchange(&m_lClosed, 1, 0);   // Set m_lClosed to 1 provided it's currently 0
        // If the swap was carried out, the return value is the old value of m_lClosed, which should be 0.
        return lRet == 0;
    }

    SOCKET m_sock;
    LONG m_lClosed;
};

The caller will call SetClosed() ; 调用者将调用SetClosed() ; if it returns true, it will then call ::closesocket() etc. Please don't question why it is that way, it just is :) 如果它返回true，它将调用::closesocket()等。请不要问为什么它是这样的，它只是:)

Consider what happens if one thread closes the socket whilst another tries to post a WSARecv() . 考虑如果一个线程关闭套接字而另一个线程试图发布WSARecv()会发生什么。 You might think that the WSARecv() will fail (the socket is closed after all!); 你可能认为WSARecv()会失败（套接字毕竟是关闭的！）; however what if a new connection is established with the same socket handle as that which we just closed - we would then be posting the WSARecv() which will succeed, but this would be fatal for my program logic since we are now associating a completely different connection with this CConnection object. 但是，如果使用与我们刚刚关闭的套接字句柄相同的套接字句柄建立新连接，那么我们将发布成功的WSARecv() ，但这对我的程序逻辑来说是致命的，因为我们现在正在关联一个完全不同的与此CConnection对象的连接。 Hence, I have the if (!m_lClosed) test. 因此，我有if (!m_lClosed)测试。 You could argue that I shouldn't be handling the same connection in multiple threads, but that is not the point of this question :) 您可能会争辩说我不应该在多个线程中处理相同的连接， 但这不是这个问题的重点 :)

That is why I need to test m_lClosed before I make the WSARecv() call. 这就是为什么我需要在进行WSARecv()调用之前测试m_lClosed 。

Now, clearly, I am only setting m_lClosed to a 1, so a torn read/write is not really a concern, but it is the principle I am concerned about . 现在，显然，我只是将m_lClosed设置为1，所以一个撕裂的读/写并不是真正的问题，但这是我关注的原则 。 What if I set m_lClosed to 2147483647 and then test for 2147483647? 如果我将m_lClosed设置为2147483647然后测试2147483647怎么办？ In this case, a torn read/write would be more problematic. 在这种情况下，撕裂的读/写将更成问题。

Answer 1

It really depends on your compiler and the CPU you are running on. 这实际上取决于您的编译器和运行的CPU。

x86 CPUs will atomically read 32-bit values without the LOCK prefix if the memory address is properly aligned. 如果内存地址正确对齐，x86 CPU将自动读取没有LOCK前缀的32位值。 However, you most likely will need some sort of memory barrier to control the CPUs out-of-order execution if the variable is used as a lock/count of some other related data. 但是，如果将变量用作某些其他相关数据的锁定/计数，则很可能需要某种内存屏障来控制CPU的无序执行。 Data that is not aligned might not be read atomically, especially if the value straddles a page boundary. 未对齐的数据可能无法以原子方式读取，尤其是当值跨越页面边界时。

If you are not hand coding assembly you also need to worry about the compilers reordering optimizations . 如果您不是手动编码程序集，则还需要担心编译器重新排序优化。

Any variable marked as volatile will have ordering constraints in the compiler (and possibly the generated machine code) when compiling with Visual C++ : 标记为volatile任何变量在使用Visual C ++编译时都会在编译器（以及可能生成的机器代码）中具有排序约束：

The _ReadBarrier, _WriteBarrier, and _ReadWriteBarrier compiler intrinsics prevent compiler re-ordering only. _ReadBarrier，_WriteBarrier和_ReadWriteBarrier编译器内在函数仅阻止编译器重新排序。 With Visual Studio 2003, volatile to volatile references are ordered; 使用Visual Studio 2003，可以订购易失性到易失性的引用; the compiler will not re-order volatile variable access. 编译器不会重新命令volatile变量访问。 With Visual Studio 2005, the compiler also uses acquire semantics for read operations on volatile variables and release semantics for write operations on volatile variables (when supported by the CPU). 使用Visual Studio 2005，编译器还使用获取语义对volatile变量进行读操作，并为volatile变量上的写操作释放语义（当CPU支持时）。

Microsoft specific volatile keyword enhancements : Microsoft特定的volatile关键字增强功能：

When the /volatile:ms compiler option is used—by default when architectures other than ARM are targeted—the compiler generates extra code to maintain ordering among references to volatile objects in addition to maintaining ordering to references to other global objects. 当使用/ volatile：ms编译器选项时 - 默认情况下，当ARM以外的体系结构成为目标时 - 除了维护对其他全局对象的引用的排序之外，编译器还会生成额外的代码来维护对volatile对象的引用之间的排序。 In particular: 尤其是：

A write to a volatile object (also known as volatile write) has Release semantics; 对volatile对象的写入（也称为volatile write）具有Release语义; that is, a reference to a global or static object that occurs before a write to a volatile object in the instruction sequence will occur before that volatile write in the compiled binary. 也就是说，在写入指令序列中的易失性对象之前发生的对全局或静态对象的引用将在编译二进制文件中的易失性写入之前发生。

A read of a volatile object (also known as volatile read) has Acquire semantics; 读取volatile对象（也称为volatile读取）具有Acquire语义; that is, a reference to a global or static object that occurs after a read of volatile memory in the instruction sequence will occur after that volatile read in the compiled binary. 也就是说，在读取指令序列中的易失性存储器之后发生的对全局或静态对象的引用将在编译二进制文件中的易失性读取之后发生。

This enables volatile objects to be used for memory locks and releases in multithreaded applications. 这使得volatile对象可用于多线程应用程序中的内存锁定和释放。

For architectures other than ARM, if no /volatile compiler option is specified, the compiler performs as if /volatile:ms were specified; 对于ARM以外的体系结构，如果未指定/ volatile编译器选项，则编译器将执行，如同指定/ volatile：ms; therefore, for architectures other than ARM we strongly recommend that you specify /volatile:iso, and use explicit synchronization primitives and compiler intrinsics when you are dealing with memory that is shared across threads. 因此，对于ARM以外的体系结构，我们强烈建议您指定/ volatile：iso，并在处理跨线程共享的内存时使用显式同步原语和编译器内在函数。

Microsoft provides compiler intrinsics for most of the Interlocked* functions and they will compile to something like LOCK XADD ... instead of a function call. Microsoft为大多数Interlocked *函数提供编译器内在函数，它们将编译为类似LOCK XADD ...而不是函数调用。

Until "recently", C/C++ had no support for atomic operations or threads in general but this changed in C11/C++11 where atomic support was added. 直到“最近”，C / C ++一般不支持原子操作或线程，但在C11 / C ++ 11中，这已经改变了原子支持。 Using the <atomic> header and its types/functions/classes moves the alignment and reordering responsibility to the compiler so you don't have to worry about that. 使用<atomic>头及其类型/函数/类将对齐和重新排序的责任移动到编译器，因此您不必担心这一点。 You still have to make a choice regarding memory barriers and this determines the machine code generated by the compiler. 您仍然需要对内存障碍做出选择，这决定了编译器生成的机器代码。 With relaxed memory order, the load atomic operation will most likely end up as a simple MOV instruction on x86. 随着内存顺序的放松， load原子操作最有可能最终成为x86上的简单MOV指令。 A stricter memory order can add a fence and possibly the LOCK prefix if the compiler determines that the target platform requires it. 如果编译器确定目标平台需要它，则更严格的内存顺序可以添加栅栏和可能的LOCK前缀。

Answer 2

In C++11, an unsynchronized access to a non-atomic object (such as m_lClosed ) is undefined behavior. 在C ++ 11中，对非原子对象（例如m_lClosed ）的非同步访问是未定义的行为。

The standard provides all the facilities you need to write this correctly; 该标准提供了正确写入所需的所有设施; you do not need non-portable functions such as InterlockedCompareExchange . 您不需要InterlockedCompareExchange等非便携式功能。 Instead, simply define your variable as atomic : 相反，只需将变量定义为atomic ：

std::atomic<bool> m_lClosed{false};

// Writer thread...
bool expected = false;
m_lClosed.compare_exhange_strong(expected, true);

// Reader...
if (m_lClosed.load()) { /* ... */ }

This is more than sufficient (it forces sequential consistency, which might be expensive). 这绰绰有余（它强制顺序一致，这可能很昂贵）。 In some cases it might be possible to generate slightly faster code by relaxing the memory order on the atomic operations, but I would not worry about that. 在某些情况下，可以通过放松原子操作的内存顺序来生成稍快的代码，但我不担心。

Answer 3

As I posted here , this question was never about protecting a critical section of code, it was purely about avoiding torn read/writes. 正如我在这里发布的那样，这个问题从未涉及保护代码的关键部分，它纯粹是为了避免破坏读/写。 user3386109 posted a comment here which I ended up using, but declined posting it as an answer here . user3386109在这里发表评论，我最终使用了，但拒绝将其作为答案发布在这里。 Thus I am providing the solution I ended up using for completeness of this question; 因此，我提供了最终用于完成此问题的解决方案; maybe it will help someone in the future. 也许它会在将来帮助某人。

The following shows the atomic setting and testing of m_lClosed : 以下显示了m_lClosed的原子设置和测试：

long m_lClosed = 0;

Thread 1 线程1

// Set flag to closed
if (InterlockedCompareExchange(&m_lClosed, 1, 0) == 0)
    cout << "Closed OK!\n";

Thread 2 线程2

This code replaces if (!m_lClosed) 此代码替换if (!m_lClosed)

if (InterlockedCompareExchange(&m_lClosed, 0, 0) == 0)
    cout << "Not closed!";

Answer 4

OK so as it turns out this really isn't necessary; 好的，事实证明这确实没有必要; this answer explains in detail why we don't need to use any interlocked operations for a simple read/write (but we do for a read-modify-write). 这个答案详细解释了为什么我们不需要使用任何互锁操作来进行简单的读/写操作（但是我们做了读 - 修改 - 写）。

在多核CPU上读取32位的原子性

问题描述

4 个解决方案

解决方案1
10 2018-04-23 21:07:49

解决方案2
4 2018-04-23 22:26:35

解决方案3
0 2018-04-26 09:11:48

解决方案4
-1 已采纳 2018-05-01 15:46:08

在多核CPU上读取32位的原子性

问题描述

4 个解决方案

解决方案1 10 2018-04-23 21:07:49

解决方案2 4 2018-04-23 22:26:35

解决方案3 0 2018-04-26 09:11:48

解决方案4 -1 已采纳 2018-05-01 15:46:08

解决方案1
10 2018-04-23 21:07:49

解决方案2
4 2018-04-23 22:26:35

解决方案3
0 2018-04-26 09:11:48

解决方案4
-1 已采纳 2018-05-01 15:46:08