简体   繁体   English

如何保证64位写入是原子的?

[英]How to guarantee 64-bit writes are atomic?

When can 64-bit writes be guaranteed to be atomic, when programming in C on an Intel x86-based platform (in particular, an Intel-based Mac running MacOSX 10.4 using the Intel compiler)? 什么时候可以保证64位写入是原子的,在基于Intel x86的平台上用C编程时(特别是使用英特尔编译器运行MacOSX 10.4的基于Intel的Mac)? For example: 例如:

unsigned long long int y;
y = 0xfedcba87654321ULL;
/* ... a bunch of other time-consuming stuff happens... */
y = 0x12345678abcdefULL;

If another thread is examining the value of y after the first assignment to y has finished executing, I would like to ensure that it sees either the value 0xfedcba87654321 or the value 0x12345678abcdef, and not some blend of them. 如果另一个线程在y的第一次赋值完成后检查y的值,我想确保它看到值0xfedcba87654321或值0x12345678abcdef,而不是它们的某些混合。 I would like to do this without any locking, and if possible without any extra code. 我想这样做没有任何锁定,如果可能的话没有任何额外的代码。 My hope is that, when using a 64-bit compiler (the 64-bit Intel compiler), on an operating system capable of supporting 64-bit code (MacOSX 10.4), that these 64-bit writes will be atomic. 我希望,当在支持64位代码(MacOSX 10.4)的操作系统上使用64位编译器(64位Intel编译器)时,这些64位写入将是原子的。 Is this always true? 这总是如此吗?

Your best bet is to avoid trying to build your own system out of primitives, and instead use locking unless it really shows up as a hot spot when profiling. 最好的办法是避免尝试用原语构建自己的系统,而是使用锁定,除非它在分析时真的显示为热点。 (If you think you can be clever and avoid locks, don't. You aren't. That's the general "you" which includes me and everybody else.) You should at minimum use a spin lock, see spinlock(3) . (如果你认为你可以聪明并且避免锁定,那就不要。你不是。那是包括我和其他人在内的一般“你”。)你应该至少使用自旋锁,参见spinlock(3) And whatever you do, don't try to implement "your own" locks. 无论你做什么, 都不要试图实现“你自己的”锁。 You will get it wrong. 你会弄错的。

Ultimately, you need to use whatever locking or atomic operations your operating system provides. 最终,您需要使用操作系统提供的任何锁定或原子操作。 Getting these sorts of things exactly right in all cases is extremely difficult . 所有情况下 完全正确地获取这些东西是非常困难的 Often it can involve knowledge of things like the errata for specific versions of specific processor. 通常它可能涉及特定处理器的特定版本的勘误表之类的知识。 ("Oh, version 2.0 of that processor didn't do the cache-coherency snooping at the right time, it's fixed in version 2.0.1 but on 2.0 you need to insert a NOP .") Just slapping a volatile keyword on a variable in C is almost always insufficient. (“哦,该处理器的2.0版本没有在正确的时间执行缓存一致性窥探,它在2.0.1版本中修复,但在2.0版本中需要插入NOP 。”)只需在变量上打一个volatile关键字在C中几乎总是不够。

On Mac OS X, that means you need to use the functions listed in atomic(3) to perform truly atomic-across-all-CPUs operations on 32-bit, 64-bit, and pointer-sized quantities. 在Mac OS X上,这意味着您需要使用atomic(3)中列出的函数对32位,64位和指针大小的数量执行真正的原子跨所有CPU操作。 (Use the latter for any atomic operations on pointers so you're 32/64-bit compatible automatically.) That goes whether you want to do things like atomic compare-and-swap, increment/decrement, spin locking, or stack/queue management. (使用后者对指针进行任何原子操作,这样你就可以自动进行32/64位兼容。)无论你是想做原子比较和交换,递增/递减,自旋锁定还是堆栈/队列,都会这样做。管理。 Fortunately the spinlock(3) , atomic(3) , and barrier(3) functions should all work correctly on all CPUs that are supported by Mac OS X. 幸运的是, spinlock(3)atomic(3)barrier(3)函数应该在Mac OS X支持的所有CPU上都能正常工作。

On x86_64, both the Intel compiler and gcc support some intrinsic atomic-operation functions. 在x86_64上,Intel编译器和gcc都支持一些内部原子操作函数。 Here's gcc's documentation of them: http://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html 这是gcc的文档: http//gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html

The Intel compiler docs also talk about them here: http://softwarecommunity.intel.com/isn/downloads/softwareproducts/pdfs/347603.pdf (page 164 or thereabouts). 英特尔编译器文档也在这里讨论它们: http//softwarecommunity.intel.com/isn/downloads/softwareproducts/pdfs/347603.pdf (第164页左右)。

According to Chapter 7 of Part 3A - System Programming Guide of Intel's processor manuals , quadword accesses will be carried out atomically if aligned on a 64-bit boundary, on a Pentium or newer, and unaligned (if still within a cache line) on a P6 or newer. 根据英特尔处理器手册第3A部分 - 系统编程指南第7章,如果在64位边界,Pentium或更高版本上对齐,并且未对齐(如果仍在高速缓存行内),则将以原子方式执行四字访问。 P6或更新。 You should use volatile to ensure that the compiler doesn't try to cache the write in a variable, and you may need to use a memory fence routine to ensure that the write happens in the proper order. 您应该使用volatile来确保编译器不会尝试将写入缓存在变量中,并且您可能需要使用内存栅栏例程来确保以正确的顺序执行写入。

If you need to base the value written on an existing value, you should use your operating system's Interlocked features (eg Windows has InterlockedIncrement64). 如果需要将写入现有值的值作为基础,则应使用操作系统的Interlocked功能(例如,Windows具有InterlockedIncrement64)。

On Intel MacOSX, you can use the built-in system atomic operations. 在Intel MacOSX上,您可以使用内置系统原子操作。 There isn't a provided atomic get or set for either 32 or 64 bit integers, but you can build that out of the provided CompareAndSwap. 没有为32位或64位整数提供原子获取或设置,但您可以使用提供的CompareAndSwap构建它。 You may wish to search XCode documentation for the various OSAtomic functions. 您可能希望在XCode文档中搜索各种OSAtomic功能。 I've written the 64-bit version below. 我在下面写了64位版本。 The 32-bit version can be done with similarly named functions. 32位版本可以使用类似命名的函数完成。

#include <libkern/OSAtomic.h>
// bool OSAtomicCompareAndSwap64Barrier(int64_t oldValue, int64_t newValue, int64_t *theValue);

void AtomicSet(uint64_t *target, uint64_t new_value)
{
    while (true)
    {
        uint64_t old_value = *target;
        if (OSAtomicCompareAndSwap64Barrier(old_value, new_value, target)) return;
    }
}

uint64_t AtomicGet(uint64_t *target)
{
    while (true)
    {
        int64 value = *target;
        if (OSAtomicCompareAndSwap64Barrier(value, value, target)) return value;
    }
}

Note that Apple's OSAtomicCompareAndSwap functions atomically perform the operation: 请注意,Apple的OSAtomicCompareAndSwap函数以原子方式执行操作:

if (*theValue != oldValue) return false;
*theValue = newValue;
return true;

We use this in the example above to create a Set method by first grabbing the old value, then attempting to swap the target memory's value. 我们在上面的例子中使用它来创建一个Set方法,首先获取旧值,然后尝试交换目标内存的值。 If the swap succeeds, that indicates that the memory's value is still the old value at the time of the swap, and it is given the new value during the swap (which itself is atomic), so we are done. 如果交换成功,则表示内存的值仍然是交换时的旧值,并且在交换期间给出了新值(它本身是原子的),所以我们完成了。 If it doesn't succeed, then some other thread has interfered by modifying the value in-between when we grabbed it and when we tried to reset it. 如果它没有成功,那么其他一些线程通过在我们抓住它和我们试图重置它时修改其间的值来干扰。 If that happens, we can simply loop and try again with only minimal penalty. 如果发生这种情况,我们可以简单地循环并再次尝试,只有最小的惩罚。

The idea behind the Get method is that we can first grab the value (which may or may not be the actual value, if another thread is interfering). Get方法背后的想法是我们可以先获取值(如果另一个线程正在干扰,则可能是也可能不是实际值)。 We can then try to swap the value with itself, simply to check that the initial grab was equal to the atomic value. 然后我们可以尝试将值与自身交换,只需检查初始抓取是否等于原子值。

I haven't checked this against my compiler, so please excuse any typos. 我没有对我的编译器进行检查,所以请原谅任何错别字。

You mentioned OSX specifically, but in case you need to work on other platforms, Windows has a number of Interlocked* functions, and you can search the MSDN documentation for them. 您特别提到了OSX,但是如果您需要在其他平台上工作,Windows有许多Interlocked *功能,您可以在MSDN文档中搜索它们。 Some of them work on Windows 2000 Pro and later, and some (particularly some of the 64-bit functions) are new with Vista. 其中一些适用于Windows 2000 Pro及更高版本,而一些(特别是一些64位功能)是Vista的新功能。 On other platforms, GCC versions 4.1 and later have a variety of __sync* functions, such as __sync_fetch_and_add(). 在其他平台上,GCC 4.1及更高版本具有各种__sync *函数,例如__sync_fetch_and_add()。 For other systems, you may need to use assembly, and you can find some implementations in the SVN browser for the HaikuOS project, inside src/system/libroot/os/arch. 对于其他系统,您可能需要使用程序集,您可以在src / system / libroot / os / arch中的HaikuOS项目的SVN浏览器中找到一些实现。

On X86, the fastest way to atomically write an aligned 64-bit value is to use FISTP. 在X86上,原子地写入对齐的64位值的最快方法是使用FISTP。 For unaligned values, you need to use a CAS2 (_InterlockedExchange64). 对于未对齐的值,您需要使用CAS2(_InterlockedExchange64)。 The CAS2 operation is quite slow due to BUSLOCK though so it can often be faster to check alignment and do the FISTP version for aligned addresses. 由于BUSLOCK,CAS2操作非常慢,因此通常可以更快地检查对齐并为对齐的地址执行FISTP版本。 Indeed, this is how the Intel Threaded building Blocks implements Atomic 64-bit writes. 实际上,这就是Intel Threaded构建模块实现Atomic 64位写入的方式。

The latest version of ISO C (C11) defines a set of atomic operations, including atomic_store(_explicit) . 最新版本的ISO C(C11)定义了一组原子操作,包括atomic_store(_explicit) See eg this page for more information. 有关详细信息,请参阅此页面

The second most portable implementation of atomics are the GCC intrinsics, which have already been mentioned. 原子的第二个最便携的实现是GCC内在函数,已经提到过。 I find that they are fully supported by GCC, Clang, Intel, and IBM compilers, and - as of the last time I checked - partially supported by the Cray compilers. 我发现它们得到了GCC,Clang,Intel和IBM编译器的全面支持,并且 - 在我最后一次检查时 - 得到了Cray编译器的部分支持。

One clear advantage of C11 atomics - in addition to the whole ISO standard thing - is that they support a more precise memory consistency prescription. C11原子的一个明显优势 - 除了整个ISO标准之外 - 是它们支持更精确的记忆一致性处方。 The GCC atomics imply a full memory barrier as far as I know. 据我所知,GCC原​​子意味着一个完整的记忆障碍。

If you want to do something like this for interthread or interprocess communication, then you need to have more than just an atomic read/write guarantee. 如果你想为interthread或进程间通信做这样的事情,那么你需要的不仅仅是原子读/写保证。 In your example, it appears that you want the values written to indicate that some work is in progress and/or has been completed. 在您的示例中,您似乎希望写入的值表示某些工作正在进行和/或已完成。 You will need to do several things, not all of which are portable, to ensure that the compiler has done things in the order you want them done (the volatile keyword may help to a certain extent) and that memory is consistent. 您需要做几件事,并非所有事情都是可移植的,以确保编译器按照您希望的顺序完成任务(volatile关键字可能在某种程度上有所帮助)并且内存是一致的。 Modern processors and caches can perform work out of order unbeknownst to the compiler, so you really need some platform support (ie., locks or platform-specific interlocked APIs) to do what it appears you want to do. 现代处理器和缓存可以在编译器不知情的情况下执行无序工作,因此您确实需要一些平台支持(即,锁或特定于平台的互锁API)来执行您想要执行的操作。

"Memory fence" or "memory barrier" are terms you may want to research. “记忆围栏”或“记忆障碍”是您可能想要研究的术语。

GCC has intrinsics for atomic operations; GCC具有原子操作的内在函数; I suspect you can do similar with other compilers, too. 我怀疑你也可以和其他编译器做类似的事情。 Never rely on the compiler for atomic operations; 永远不要依赖编译器进行原子操作; optimization will almost certainly run the risk of making even obviously atomic operations into non-atomic ones unless you explicitly tell the compiler not to do so. 除非你明确地告诉编译器不要这样做,否则优化几乎肯定会冒着将显然原子操作变成非原子操作的风险。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM