Interlocked.CompareExchange 是否使用内存屏障？

Question

I'm reading Joe Duffy's post about Volatile reads and writes, and timeliness , and i'm trying to understand something about the last code sample in the post:我正在阅读 Joe Duffy 的关于Volatile 读写和及时性的帖子，我试图了解帖子中的最后一个代码示例：

while (Interlocked.CompareExchange(ref m_state, 1, 0) != 0) ;
m_state = 0;
while (Interlocked.CompareExchange(ref m_state, 1, 0) != 0) ;
m_state = 0;
…

When the second CMPXCHG operation is executed, does it use a memory barrier to ensure that the value of m_state is indeed the latest value written to it?当执行第二CMPXCHG操作，它使用一个内存屏障，以确保m_state的价值确实写入的最新值？ Or will it just use some value that is already stored in the processor's cache?或者它只会使用一些已经存储在处理器缓存中的值？ (assuming m_state isn't declared as volatile). （假设m_state未声明为 volatile）。
If I understand correctly, if CMPXCHG won't use a memory barrier, then the whole lock acquisition procedure won't be fair since it's highly likely that the thread that was the first to acquire the lock, will be the one that will acquire all of following locks .如果我理解正确，如果 CMPXCHG 不使用内存屏障，那么整个锁获取过程将不公平，因为第一个获取锁的线程很可能是将获取所有锁的线程以下锁。 Did I understand correctly, or am I missing out on something here?我理解正确，还是我在这里错过了什么？

Edit : The main question is actually whether calling to CompareExchange will cause a memory barrier before attempting to read m_state's value.编辑：主要问题实际上是在尝试读取 m_state 的值之前调用 CompareExchange 是否会导致内存障碍。 So whether assigning 0 will be visible to all of the threads when they try to call CompareExchange again.因此，当所有线程再次尝试调用 CompareExchange 时，所有线程是否都可以看到分配 0。

Answer 1

Any x86 instruction that has lock prefix has full memory barrier .任何具有锁定前缀的 x86 指令都有完整的内存屏障。 As shown Abel's answer, Interlocked* APIs and CompareExchanges use lock -prefixed instruction such as lock cmpxchg .如 Abel 的回答所示，Interlocked* API 和 CompareExchanges 使用lock前缀指令，例如lock cmpxchg 。 So, it implies memory fence.因此，它意味着内存栅栏。

Yes, Interlocked.CompareExchange uses a memory barrier.是的，Interlocked.CompareExchange 使用内存屏障。

Why?为什么？ Because x86 processors did so.因为 x86 处理器这样做了。 From Intel's Volume 3A: System Programming Guide Part 1 , Section 7.1.2.2:来自英特尔的第 3A 卷：系统编程指南第 1 部分，第 7.1.2.2 节：

For the P6 family processors, locked operations serialize all outstanding load and store operations (that is, wait for them to complete).对于 P6 系列处理器，锁定操作序列化所有未完成的加载和存储操作（即等待它们完成）。 This rule is also true for the Pentium 4 and Intel Xeon processors, with one exception.这一规则也适用于 Pentium 4 和 Intel Xeon 处理器，只有一个例外。 Load operations that reference weakly ordered memory types (such as the WC memory type) may not be serialized.引用弱排序内存类型（例如 WC 内存类型）的加载操作可能不会被序列化。

volatile has nothing to do with this discussion. volatile与本次讨论无关。 This is about atomic operations;这是关于原子操作； to support atomic operations in CPU, x86 guarantees all previous loads and stores to be completed.为了支持 CPU 中的原子操作，x86 保证完成所有先前的加载和存储。

Answer 2

ref doesn't respect the usual volatile rules, especially in things like: ref不遵守通常的volatile规则，尤其是在以下方面：

volatile bool myField;
...
RunMethod(ref myField);
...
void RunMethod(ref bool isDone) {
    while(!isDone) {} // silly example
}

Here, RunMethod is not guaranteed to spot external changes to isDone even though the underlying field ( myField ) is volatile ;在这里，即使基础字段 ( myField ) 是volatile ， RunMethod也不能保证发现isDone外部更改； RunMethod doesn't know about it, so doesn't have the right code. RunMethod不知道它，所以没有正确的代码。

However!然而！ This should be a non-issue:这应该不是问题：

if you are using Interlocked , then use Interlocked for all access to the field如果您使用的Interlocked ，然后使用Interlocked所有访问现场
if you are using lock , then use lock for all access to the field如果您使用的lock ，然后用lock的所有访问现场

Follow those rules and it should work OK.遵循这些规则，它应该可以正常工作。

Re the edit;重新编辑； yes, that behaviour is a critical part of Interlocked .是的，这种行为是Interlocked的关键部分。 To be honest, I don't know how it is implemented (memory barrier, etc - note they are "InternalCall" methods, so I can't check ;-p) - but yes: updates from one thread will be immediately visible to all others as long as they use the Interlocked methods (hence my point above).老实说，我不知道它是如何实现的（内存屏障等 - 请注意它们是“InternalCall”方法，所以我无法检查 ;-p） - 但是是的：来自一个线程的更新将立即可见所有其他人，只要他们使用Interlocked方法（因此我上面的观点）。

Answer 3

There seems to be some comparison with the Win32 API functions by the same name, but this thread is all about the C# Interlocked class.似乎与同名的 Win32 API 函数有一些比较，但这个线程是关于 C# Interlocked类的。 From its very description, it is guaranteed that its operations are atomic.从它的描述来看，可以保证它的操作是原子的。 I'm not sure how that translates to "full memory barriers" as mentioned in other answers here, but judge for yourself.我不确定这如何转化为此处其他答案中提到的“完全内存障碍”，但请自行判断。

On uniprocessor systems, nothing special happens, there's just a single instruction:在单处理器系统上，没有什么特别的事情发生，只有一条指令：

FASTCALL_FUNC CompareExchangeUP,12
        _ASSERT_ALIGNED_4_X86 ecx
        mov     eax, [esp+4]    ; Comparand
        cmpxchg [ecx], edx
        retn    4               ; result in EAX
FASTCALL_ENDFUNC CompareExchangeUP

But on multiprocessor systems, a hardware lock is used to prevent other cores to access the data at the same time:但是在多处理器系统上，硬件锁用于防止其他内核同时访问数据：

FASTCALL_FUNC CompareExchangeMP,12
        _ASSERT_ALIGNED_4_X86 ecx
        mov     eax, [esp+4]    ; Comparand
  lock  cmpxchg [ecx], edx
        retn    4               ; result in EAX
FASTCALL_ENDFUNC CompareExchangeMP

An interesting read with here and there some wrong conclusions, but all-in-all excellent on the subject is this blog post on CompareExchange .一个有趣的读物在这里和那里有一些错误的结论，但关于这个主题的所有优秀文章是关于 CompareExchange 的这篇博客文章。

Update for ARM ARM 更新

As often, the answer is, "it depends".通常，答案是“视情况而定”。 It appears that prior to 2.1, the ARM had a half-barrier .似乎在 2.1 之前，ARM有一个 half-barrier 。 For the 2.1 release, this behavior was changed to a full barrier for the Interlocked operations.对于 2.1 版本，此行为已更改为Interlocked操作的完整屏障。

The current code can be found here and actual implementation of CompareExchange here .当前代码可以在这里找到，CompareExchange 的实际实现可以在这里找到。 Discussions on the generated ARM assembly, as well as examples on generated code can be seen in the aforementioned PR.关于生成的 ARM 程序集的讨论以及生成代码的示例可以在前面提到的 PR 中看到。

Answer 4

MSDN says about the Win32 API functions: " Most of the interlocked functions provide full memory barriers on all Windows platforms "MSDN谈到 Win32 API 函数：“大多数互锁函数在所有 Windows 平台上都提供完整的内存屏障”

(the exceptions are Interlocked functions with explicit Acquire / Release semantics) （例外是具有显式获取/释放语义的互锁函数）

From that I would conclude that the C# runtime's Interlocked makes the same guarantees, as they are documented withotherwise identical behavior (and they resolve to intrinsic CPU statements on the platforms i know).由此我得出结论，C# 运行时的 Interlocked 做出了相同的保证，因为它们以其他相同的行为记录（并且它们解析为我所知道的平台上的内在 CPU 语句）。 Unfortunately, with MSDN's tendency to put up samples instead of documentation, it isn't spelled out explicitly.不幸的是，由于 MSDN 倾向于提供示例而不是文档，因此没有明确说明。

Answer 5

The interlocked functions are guaranteed to stall the bus and the cpu while it resolves the operands.互锁函数保证在解析操作数时停止总线和 cpu。 The immediate consequence is that no thread switch, on your cpu or another one, will interrupt the interlocked function in the middle of its execution.直接的结果是，您的 CPU 或其他 CPU 上的任何线程切换都不会在执行过程中中断互锁函数。

Since you're passing a reference to the c# function, the underlying assembler code will work with the address of the actual integer, so the variable access won't be optimized away.由于您正在传递对 c# 函数的引用，底层汇编代码将使用实际整数的地址，因此不会优化变量访问。 It will work exactly as expected.它将完全按预期工作。

edit: Here's a link that explains the behaviour of the asm instruction better: http://faydoc.tripod.com/cpu/cmpxchg.htm编辑：这里有一个链接，可以更好地解释 asm 指令的行为： http : //faydoc.tripod.com/cpu/cmpxchg.htm
As you can see, the bus is stalled by forcing a write cycle, so any other "threads" (read: other cpu cores) that would try to use the bus at the same time would be put in a waiting queue.如您所见，总线通过强制写入周期而停止，因此任何其他尝试同时使用总线的“线程”（读取：其他 cpu 内核）都将被放入等待队列中。

Answer 6

According to ECMA-335 (section I.12.6.5):根据 ECMA-335（第 I.12.6.5 节）：

5. Explicit atomic operations. 5. 显式原子操作。 The class library provides a variety of atomic operations in the System.Threading.Interlocked class.类库在 System.Threading.Interlocked 类中提供了多种原子操作。 These operations (eg, Increment, Decrement, Exchange, and CompareExchange) perform implicit acquire/release operations .这些操作（例如，递增、递减、交换和比较交换）执行隐式获取/释放操作。

So, these operations follow principle of least astonishment .因此，这些操作遵循最小惊讶原则。

Interlocked.CompareExchange 是否使用内存屏障？

问题描述

6 个解决方案

解决方案1
26 已采纳 2009-11-11 16:57:40

解决方案2
10 2009-10-17 08:23:37

解决方案3
7 2009-11-10 00:18:35

Update for ARM ARM 更新

解决方案4
3 2009-10-17 16:33:03

解决方案5
2 2009-10-17 09:15:44

解决方案6
1 2017-07-27 09:37:15

Interlocked.CompareExchange 是否使用内存屏障？

问题描述

6 个解决方案

解决方案1 26 已采纳 2009-11-11 16:57:40

解决方案2 10 2009-10-17 08:23:37

解决方案3 7 2009-11-10 00:18:35

Update for ARM ARM 更新

解决方案4 3 2009-10-17 16:33:03

解决方案5 2 2009-10-17 09:15:44

解决方案6 1 2017-07-27 09:37:15

解决方案1
26 已采纳 2009-11-11 16:57:40

解决方案2
10 2009-10-17 08:23:37

解决方案3
7 2009-11-10 00:18:35

解决方案4
3 2009-10-17 16:33:03

解决方案5
2 2009-10-17 09:15:44

解决方案6
1 2017-07-27 09:37:15