这个比较交换函数中的内联汇编是如何工作的？（ARM 上的 %H 修饰符）

Question

static inline unsigned long long __cmpxchg64(unsigned long long *ptr,unsigned long long old,unsigned long long new)
{
    unsigned long long oldval;
    unsigned long res;
    prefetchw(ptr);
    __asm__ __volatile__(
"1: ldrexd      %1, %H1, [%3]\n"
"   teq     %1, %4\n"
"   teqeq       %H1, %H4\n"
"   bne     2f\n"
"   strexd      %0, %5, %H5, [%3]\n"
"   teq     %0, #0\n"
"   bne     1b\n"
"2:"
    : "=&r" (res), "=&r" (oldval), "+Qo" (*ptr)
    : "r" (ptr), "r" (old), "r" (new)
    : "cc");
    return oldval;
}

I find in gnu manual (extend extended-asm) that 'H' in '%H1' means 'Add 8 bytes to an offsettable memory reference'.我在 gnu 手册 (extend extended-asm) 中发现 '%H1' 中的 'H' 的意思是'将 8 个字节添加到可偏移的内存引用中'。

But I think if I want to load double word long data to oldval (a long long value), it should be add 4 bytes to the '%1' which is the low 32 bits of oldval as the high 32 bits of oldval.但我认为如果我想将双字长数据加载到 oldval（一个 long long 值），它应该在 oldval 的低 32 位作为 oldval 的高 32 位的 '%1' 中添加 4 个字节。 So what is my mistake?那么我的错误是什么？

Answer 1

I find in gnu manual(extend extended-asm) that 'H' in '%H1' means 'Add 8 bytes to an offsettable memory reference'.我在 gnu manual(extend extended-asm) 中发现 '%H1' 中的 'H' 表示“将 8 个字节添加到可偏移的内存引用”。

That table of template modifiers is for x86 only. 该模板修饰符表仅适用于 x86。 It is not applicable to ARM.它不适用于 ARM。

The template modifiers for ARM are unfortunately not documented in the GCC manual, but they are defined in the armclang manual and GCC conforms to those definitions as far as I can tell.不幸的是，ARM 的模板修饰符没有记录在 GCC 手册中，但它们在armclang 手册中定义，据我所知，GCC 符合这些定义。 So the correct meaning of the H template modifier here is:所以这里H模板修饰符的正确含义是：

The operand must use the r constraint, and must be a 64-bit integer or floating-point type.操作数必须使用 r 约束，并且必须是 64 位整数或浮点类型。 The operand is printed as the highest-numbered register holding half of the value.操作数被打印为拥有一半值的最高编号寄存器。

Now this makes sense.现在这是有道理的。 Operand 1 to the inline asm is oldval which is of type unsigned long long , 64 bits, so the compiler will allocate two consecutive 32-bit general purpose registers for it.内联汇编的操作数 1 是oldval ，它的类型是unsigned long long ，64 位，因此编译器将为它分配两个连续的 32 位通用寄存器。 Let's say they are r4 and r5 as in this compiled output .假设它们是编译输出中的r4和r5 。 Then %1 will expand to r4 , and %H1 will expand to r5 , which is exactly what the ldrexd instruction needs.然后%1将扩展为r4 ，而%H1将扩展为r5 ，这正是ldrexd指令所需要的。 Likewise, %4, %H4 expanded to r2, r3 , and %5, %H5 expanded to fp, ip , which are alternative names for r11, r12 .同样， %4, %H4扩展为r2, r3和%5, %H5扩展为fp, ip ，它们是r11, r12的替代名称。

The answer by frant explains what a compare-exchange is supposed to do. frant 的答案解释了比较交换应该做什么。 (The spelling cmpxchg might come from the mnemonic for the x86 compare-exchange instruction.) And if you read through the code now, you should see that it does exactly that. （拼写cmpxchg可能来自 x86 compare-exchange 指令的助记符。）如果您现在阅读代码，您应该会看到它确实做到了。 The teq; teqeq; bne teq; teqeq; bne teq; teqeq; bne teq; teqeq; bne between ldrexd and strexd will abort the store if old and *ptr were unequal.如果old和*ptr不相等， ldrexd和strexd之间的teq; teqeq; bne将中止存储。 And the teq; bne和teq; bne teq; bne after strexd will cause a retry if the exclusive store failed, which happens if there was an intervening access to *ptr (by another core, interrupt handler, etc).如果独占存储失败，则在strexd之后的teq; bne将导致重试，如果存在对*ptr的干预访问（由另一个内核、中断处理程序等），则会发生这种情况。 That is how atomicity is ensured.这就是确保原子性的方式。

这个比较交换函数中的内联汇编是如何工作的？（ARM 上的 %H 修饰符）

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-05-22 19:09:08

这个比较交换函数中的内联汇编是如何工作的？ （ARM 上的 %H 修饰符）

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-05-22 19:09:08

这个比较交换函数中的内联汇编是如何工作的？（ARM 上的 %H 修饰符）

解决方案1
1 已采纳 2022-05-22 19:09:08