[英]How does the inline assembly in this compare-exchange function work? (%H modifier on ARM)
static inline unsigned long long __cmpxchg64(unsigned long long *ptr,unsigned long long old,unsigned long long new)
{
unsigned long long oldval;
unsigned long res;
prefetchw(ptr);
__asm__ __volatile__(
"1: ldrexd %1, %H1, [%3]\n"
" teq %1, %4\n"
" teqeq %H1, %H4\n"
" bne 2f\n"
" strexd %0, %5, %H5, [%3]\n"
" teq %0, #0\n"
" bne 1b\n"
"2:"
: "=&r" (res), "=&r" (oldval), "+Qo" (*ptr)
: "r" (ptr), "r" (old), "r" (new)
: "cc");
return oldval;
}
I find in gnu manual (extend extended-asm) that 'H' in '%H1' means 'Add 8 bytes to an offsettable memory reference'.我在 gnu 手册 (extend extended-asm) 中发现 '%H1' 中的 'H' 的意思是'将 8 个字节添加到可偏移的内存引用中'。
But I think if I want to load double word long data to oldval (a long long value), it should be add 4 bytes to the '%1' which is the low 32 bits of oldval as the high 32 bits of oldval.但我认为如果我想将双字长数据加载到 oldval(一个 long long 值),它应该在 oldval 的低 32 位作为 oldval 的高 32 位的 '%1' 中添加 4 个字节。 So what is my mistake?那么我的错误是什么?
I find in gnu manual(extend extended-asm) that 'H' in '%H1' means 'Add 8 bytes to an offsettable memory reference'.我在 gnu manual(extend extended-asm) 中发现 '%H1' 中的 'H' 表示“将 8 个字节添加到可偏移的内存引用”。
That table of template modifiers is for x86 only. 该模板修饰符表仅适用于 x86。 It is not applicable to ARM.它不适用于 ARM。
The template modifiers for ARM are unfortunately not documented in the GCC manual, but they are defined in the armclang manual and GCC conforms to those definitions as far as I can tell.不幸的是,ARM 的模板修饰符没有记录在 GCC 手册中,但它们在armclang 手册中定义,据我所知,GCC 符合这些定义。 So the correct meaning of the H
template modifier here is:所以这里H
模板修饰符的正确含义是:
The operand must use the r constraint, and must be a 64-bit integer or floating-point type.操作数必须使用 r 约束,并且必须是 64 位整数或浮点类型。 The operand is printed as the highest-numbered register holding half of the value.操作数被打印为拥有一半值的最高编号寄存器。
Now this makes sense.现在这是有道理的。 Operand 1 to the inline asm is oldval
which is of type unsigned long long
, 64 bits, so the compiler will allocate two consecutive 32-bit general purpose registers for it.内联汇编的操作数 1 是oldval
,它的类型是unsigned long long
,64 位,因此编译器将为它分配两个连续的 32 位通用寄存器。 Let's say they are r4
and r5
as in this compiled output .假设它们是编译输出中的r4
和r5
。 Then %1
will expand to r4
, and %H1
will expand to r5
, which is exactly what the ldrexd
instruction needs.然后%1
将扩展为r4
,而%H1
将扩展为r5
,这正是ldrexd
指令所需要的。 Likewise, %4, %H4
expanded to r2, r3
, and %5, %H5
expanded to fp, ip
, which are alternative names for r11, r12
.同样, %4, %H4
扩展为r2, r3
和%5, %H5
扩展为fp, ip
,它们是r11, r12
的替代名称。
The answer by frant explains what a compare-exchange is supposed to do. frant 的答案解释了比较交换应该做什么。 (The spelling cmpxchg
might come from the mnemonic for the x86 compare-exchange instruction.) And if you read through the code now, you should see that it does exactly that. (拼写cmpxchg
可能来自 x86 compare-exchange 指令的助记符。)如果您现在阅读代码,您应该会看到它确实做到了。 The teq; teqeq; bne
teq; teqeq; bne
teq; teqeq; bne
teq; teqeq; bne
between ldrexd
and strexd
will abort the store if old
and *ptr
were unequal.如果old
和*ptr
不相等, ldrexd
和strexd
之间的teq; teqeq; bne
将中止存储。 And the teq; bne
和teq; bne
teq; bne
after strexd
will cause a retry if the exclusive store failed, which happens if there was an intervening access to *ptr
(by another core, interrupt handler, etc).如果独占存储失败,则在strexd
之后的teq; bne
将导致重试,如果存在对*ptr
的干预访问(由另一个内核、中断处理程序等),则会发生这种情况。 That is how atomicity is ensured.这就是确保原子性的方式。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.