[英]Clang doesn't inline std::atomic::load for loading 64-bit structs
Consider the following code, which uses a std::atomic
to atomically load a 64-bit object.考虑以下代码,它使用
std::atomic
以原子方式加载 64 位对象。
#include <atomic>
struct A {
int32_t x, y;
};
A f(std::atomic<A>& a) {
return a.load(std::memory_order_relaxed);
}
With GCC, good things happen, and the following code is generated.使用 GCC,好事发生了,并生成了以下代码。 ( https://godbolt.org/z/zS53ZF )
( https://godbolt.org/z/zS53ZF )
f(std::atomic<A>&):
mov rax, QWORD PTR [rdi]
ret
This is exactly what I'd expect, since I see no reason why a 64-bit struct shouldn't be able to be treated like any other 64-bit word in this situation.这正是我所期望的,因为我看不出为什么在这种情况下 64 位结构不能像任何其他 64 位字一样被对待。
With Clang, however, the story is different.但是,对于 Clang,情况就不同了。 Clang generates the following.
Clang 生成以下内容。 ( https://godbolt.org/z/d6uqrP )
( https://godbolt.org/z/d6uqrP )
f(std::atomic<A>&): # @f(std::atomic<A>&)
push rax
mov rsi, rdi
mov rdx, rsp
mov edi, 8
xor ecx, ecx
call __atomic_load
mov rax, qword ptr [rsp]
pop rcx
ret
mov rdi, rax
call __clang_call_terminate
__clang_call_terminate: # @__clang_call_terminate
push rax
call __cxa_begin_catch
call std::terminate()
This is problematic for me for several reasons:这对我来说是有问题的,原因有几个:
__atomic_load
, which means that my binary needs to be linked with libatomic.__atomic_load
的调用,这意味着我的二进制文件需要与 libatomic 链接。 This means I need different lists of libraries to link depending on whether user's of my code use GCC or Clang.The important question on my mind right now is whether there is a way to get Clang to also convert the load into a single instruction.我现在想到的重要问题是是否有办法让 Clang 也将负载转换为单个指令。 We are using this as part of a library that we plan to distribute to others, so we cannot rely on a particular compiler being used.
我们将它用作我们计划分发给其他人的库的一部分,因此我们不能依赖正在使用的特定编译器。 The solution suggested to me so far is to use type punning and store the struct inside a union alongside a 64-bit int, since Clang does correctly load 64-bit ints atomically in one instruction.
到目前为止向我建议的解决方案是使用类型双关并将结构体与 64 位整数一起存储在联合中,因为 Clang 确实在一条指令中正确地以原子方式加载了 64 位整数。 I am skeptical of this solution, however, since although it appears to work on all major compilers, I have read that it is in fact undefined behaviour.
然而,我对这个解决方案持怀疑态度,因为虽然它似乎适用于所有主要编译器,但我已经读到它实际上是未定义的行为。 Such code is also not particularly friendly for others to read and understand if they are not familiar with the trick.
如果其他人不熟悉该技巧,则此类代码对其阅读和理解也不是特别友好。
To summarize, is there a way to atomically load a 64-bit struct that:总而言之,有没有办法自动加载 64 位结构:
This clang missed optimization only happens with libstdc++;这种铿锵的优化只发生在 libstdc++ 中; clang on Godbolt inlines as we expect for
-stdlib=libc++
.正如我们对
-stdlib=libc++
所期望的那样,在 Godbolt 内联上叮当-stdlib=libc++
。 https://godbolt.org/z/Tt8XTX . https://godbolt.org/z/Tt8XTX 。
It seems that giving the struct 64-bit alignment is sufficient to hand-hold clang.似乎给 struct 64 位对齐足以手持 clang。
libstdc++
's std::atomic
template does that for types that are small enough to be atomic when naturally aligned, but perhaps clang++ is only seeing the alignment of the underlying type, not the class member of atomic<T>
, in the libstdc++ implementation. libstdc++
的std::atomic
模板对自然对齐时足够小以成为原子的类型执行此操作,但也许 clang++ 在 libstdc++ 实现中只看到底层类型的对齐,而不是atomic<T>
的类成员. I haven't investigated;我没有调查过; someone should report this to the clang / LLVM bugzilla.
有人应该将此报告给 clang / LLVM bugzilla。
#include <atomic>
#include <stdint.h> // you forgot this header.
struct A {
alignas(2 * sizeof(int32_t)) int32_t x;
int32_t y; // this one must be separate, otherwise y would also be aligned -> 16-byte object
};
A f(std::atomic<A>& a) {
return a.load(std::memory_order_relaxed);
}
Aligning by the struct size makes it agnostic of alignof(int64_t)
, which on a 32-bit ABI might only be 4. (And I didn't use alignas(8)
to avoid over-alignment on systems where char is 32-bit and sizeof(int64_t) = 2.) This may be needlessly complicated, and alignas(int64_t)
is easier to read, even though it's not always the same thing as giving this struct natural alignment.)按结构大小对齐使其与
alignof(int64_t)
无关,在 32 位 ABI 上可能只有 4。(我没有使用alignas(8)
来避免在 char 为 32 位的系统上过度对齐和 sizeof(int64_t) = 2.) 这可能是不必要的复杂,而alignas(int64_t)
更容易阅读,即使它并不总是与给这个结构自然对齐相同的东西。)
# clang++ 9.0 -std=gnu++17 -O3; g++ is the same
f(std::atomic<A>&):
mov rax, qword ptr [rdi]
ret
BTW, no, the libatomic
library function won't use a lock;顺便说一句,不,
libatomic
库函数不会使用锁; it does know that 8-byte aligned loads are naturally atomic and that other use threads will be using plain loads/stores, not locks.它确实知道 8 字节对齐的加载自然是原子的,其他使用线程将使用普通加载/存储,而不是锁。
Older clang at least uses call __atomic_load_8
instead of the generic variable-sized one, but that's still a big missed optimization.较旧的 clang 至少使用
call __atomic_load_8
而不是通用的可变大小的call __atomic_load_8
,但这仍然是一个很大的遗漏优化。
Fun fact: clang -m32
will use lock cmpxchg8b
to implement an 8-byte atomic load, instead of using SSE or fild
like GCC does.有趣的事实:
clang -m32
将使用lock cmpxchg8b
来实现 8 字节的原子加载,而不是像 GCC 那样使用 SSE 或fild
。 :/ :/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.