简体   繁体   English

ARMv8浮点输出内联汇编

[英]ARMv8 floating point output inline assembly

For adding two integers, I write: 为了添加两个整数,我写:

int sum;
asm volatile("add %0, x3, x4" : "=r"(sum) : :);

How can I do this with two floats? 如何用两个浮子做到这一点? I tried: 我试过了:

float sum;
asm volatile("fadd %0, s3, s4" : "=r"(sum) : :);

But it gives me an error: 但这给了我一个错误:

Error: operand 1 should be a SIMD vector register -- `fadd x0,s3,s4' 错误:操作数1应该是SIMD向量寄存器-`fadd x0,s3,s4'

Any ideas? 有任何想法吗?

Because registers can have multiple names in AArch64 (v0, b0, h0, s0, d0 all refer to the same register) it is necessary to add an output modifier to the print string: 因为寄存器在AArch64中可以有多个名称(v0,b0,h0,s0,d0都引用同一个寄存器),所以有必要在打印字符串中添加输出修饰符:

On Godbolt 在哥德宝

float foo()
{
    float sum;
    asm volatile("fadd %s0, s3, s4" : "=w"(sum) : :);
    return sum;
}

double dsum()
{
    double sum;
    asm volatile("fadd %d0, d3, d4" : "=w"(sum) : :);
    return sum;
}

Will produce: 将产生:

foo:
        fadd s0, s3, s4 // sum
        ret     
dsum:
        fadd d0, d3, d4 // sum
        ret  

"=r" is the constraint for GP integer registers. "=r"是GP整数寄存器的约束。

The GCC manual claims that "=w" is the constraint for an FP / SIMD register on AArch64. GCC手册声称"=w"是AArch64上FP / SIMD寄存器的约束。 But if you try that, you get v0 not s0 , which won't assemble. 但是,如果尝试这样做,则不会得到v0而不是s0 I don't know a workaround here, you should probably report on the gcc bugzilla that the constraint documented in the manual doesn't work for scalar FP. 我在这里不知道解决方法,您可能应该报告gcc bugzilla,该手册中记录的约束不适用于标量FP。

On Godbolt I tried this source: 在Godbolt上,我尝试了以下来源:

float foo()
{
    float sum;
#ifdef __aarch64__
    asm volatile("fadd %0, s3, s4" : "=w"(sum) : :);   // AArch64
#else
    asm volatile("fadds %0, s3, s4" : "=t"(sum) : :);  // ARM32
#endif
    return sum;
}

double dsum()
{
    double sum;
#ifdef __aarch64__
    asm volatile("fadd %0, d3, d4" : "=w"(sum) : :);   // AArch64
#else
    asm volatile("faddd %0, d3, d4" : "=w"(sum) : :);  // ARM32
#endif
    return sum;
}

clang7.0 (with its built-in assembler) requires the asm to be actually valid. clang7.0(及其内置的汇编程序)要求asm实际上是有效的。 But for gcc we're only compiling to asm, and Godbolt doesn't have a "binary mode" for non-x86. 但是对于gcc,我们仅编译为asm,而Godbolt对于非x86没有“二进制模式”。

# AArch64 gcc 8.2  -xc -O3 -fverbose-asm -Wall
# INVALID ASM, errors if you try to actually assemble it.
foo:
    fadd v0, s3, s4 // sum
    ret     
dsum:
    fadd v0, d3, d4 // sum
    ret

clang produces the same asm, and its built-in assembler errors with: clang产生相同的asm,并且其内置汇编器错误包括:

<source>:5:18: error: invalid operand for instruction
    asm volatile("fadd %0, s3, s4" : "=w"(sum) : :);
                 ^
<inline asm>:1:11: note: instantiated into assembly here
        fadd v0, s3, s4
             ^

On 32-bit ARM , =t" for single works, but "=w" for (which the manual says you should use for double-precision) also gives you s0 with gcc. It works with clang, though. You have to use -mfloat-abi=hard and a -mcpu= something with an FPU, eg -mcpu=cortex-a15 在32位ARM上=t"表示单项工作,但是"=w"表示(本手册说您应该使用双精度),gcc也会给您s0 。但是它适用于clang。您必须使用-mfloat-abi=hard-mcpu=带有FPU的东西,例如-mcpu=cortex-a15

# clang7.0 -xc -O3 -Wall--target=arm -mcpu=cortex-a15 -mfloat-abi=hard
# valid asm for ARM 32
foo:
        vadd.f32        s0, s3, s4
        bx      lr
dsum:
        vadd.f64        d0, d3, d4
        bx      lr

But gcc fails: 但是gcc失败了:

# ARM gcc 8.2  -xc -O3 -fverbose-asm -Wall -mfloat-abi=hard -mcpu=cortex-a15
foo:
        fadds s0, s3, s4        @ sum
        bx      lr  @
dsum:
        faddd s0, d3, d4        @ sum    @@@ INVALID
        bx      lr  @

So you can use =t for single just fine with gcc, but for double presumably you need a %something0 modifier to print the register name as d0 instead of s0 , with a "=w" output. 因此,对于gcc,可以将=t用于单值,但对于double可以使用%something0修饰符,以将寄存器名称显示为d0而不是s0 ,并输出"=w"


Obviously these asm statements would only be useful for anything beyond learning the syntax if you add constraints to specify the input operands as well, instead of reading whatever happened to be sitting in s3 and s4. 显然,如果您还添加了约束以指定输入操作数,而不是读取s3和s4中的内容,则这些asm语句仅对语法学习以外的任何事情有用。

See also https://stackoverflow.com/tags/inline-assembly/info 另请参阅https://stackoverflow.com/tags/inline-assembly/info

ARMv7 double: %P modifier ARMv7 double: %P修饰符

GCC devs informed me the correct undocumented modifier for ARMv7 doubles at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89482#c4 Maybe I should stop being lazy and grep GCC some day: GCC开发人员在https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89482#c4告知我正确的ARMv7修饰符加倍了,也许有一天我应该不再懒惰并grep GCC:

main.c main.c中

#include <assert.h>

int main(void) {
    double my_double = 1.5;
    __asm__ (
        "vmov.f64 d0, 1.0;"
        "vadd.f64 %P[my_double], %P[my_double], d0;"
        : [my_double] "+w" (my_double)
        :
        : "d0"
    );
    assert(my_double == 2.5);
}

Compile and run: 编译并运行:

sudo apt-get install qemu-user gcc-arm-linux-gnueabihf
arm-linux-gnueabihf-gcc -O3 -std=c99 -ggdb3 -march=armv7-a -marm \
  -pedantic -Wall -Wextra -o main.out main.c
qemu-arm -L /usr/arm-linux-gnueabihf main.out

Disassembly contains: 拆卸包含:

   0x00010320 <+4>:     08 7b b7 ee     vmov.f64        d7, #120        ; 0x3fc00000  1.5
   0x00010324 <+8>:     00 0b b7 ee     vmov.f64        d0, #112        ; 0x3f800000  1.0
   0x00010328 <+12>:    00 7b 37 ee     vadd.f64        d7, d7, d0

Tested in Ubuntu 16.04, GCC 5.4.0, QEMU 2.5.0. 在Ubuntu 16.04,GCC 5.4.0,QEMU 2.5.0中进行了测试。

Source code definition point 源代码定义点

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM