关于ARM汇编的浮点指令

Question

I'm trying to create an ARM benchmark that loop over the following instructions (in assembly), alone and in combination: 我正在尝试创建一个ARM基准，以单独或组合方式遍历以下指令（汇编中）：

Integer additions 整数加法
Integer multiplications 整数乘法
Float point additions 浮点数添加
Float point multiplications 浮点乘法

This is my code for integer operations: 这是我的整数运算代码：

int additions_int(int n) {

    int i, dummyValue = n;

    __asm (
        "MOV R0, #2\n"
        "MOV R1, #6\n"
    );

    for (i = 0; i < n/LOOP_STEP; i++) {

        __asm (
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
            "ADD R0, R0, R1\n"
        );
    }

    return dummyValue;
}


int multiplications_int(int n) {

    int i, dummyValue=n;

    __asm (
        "MOV R0, #2\n"
        "MOV R1, #6\n"
    );

    for (i = 0; i < n/LOOP_STEP; i++) {

        __asm (

            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"
            "MUL R0, R0, R1\n"

        );

    }

    return dummyValue;
}

The problem is in the float point operations. 问题出在浮点运算中。 I checked this documentation , and I've tryed to do something like this: 我检查了此文档，并尝试执行以下操作：

float multiplications_fp(int n) {
    int i;
    float fn=n, dummyValue = fn;

    for (i = 0; i < n/LOOP_STEP; i++) {
        __asm (
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
            "VMUL.F32 R0, R0, R1\n"
        );
    }

    return dummyValue;
}


float additions_fp(int n) {
    int i;
    float fn=n, dummyValue = fn;

    for (i = 0; i < n/LOOP_STEP; i++) {
        __asm (
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n"
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n" 
            "VADD.F32 R0, R0, R1\n"  
        );
    }

    return dummyValue;
}

Compiling with: 编译：

arm-linux-gnueabi-gcc -static -march=armv7-a microbenchmark_arm.c -o microbenchmark_arm

I'm getting this error: 我收到此错误：

Error: selected processor does not support ARM mode `vmul.f32 R0,R0,R1'
Error: selected processor does not support ARM mode `vadd.f32 R0,R0,R1'

Can anyone say me what I'm doing wrong? 谁能说我在做什么错？

Can anyone show me an example of float point additions or multiplications for ARM Cortex-A architecture? 谁能给我展示ARM Cortex-A架构浮点加法或乘法的示例吗？

Answer 1

Floating point instructions have a different register bank. 浮点指令具有不同的寄存器组。 For most of the instructions, you cannot share these registers. 对于大多数说明，您不能共享这些寄存器。 But this is the same register as for Neon SIMD instructions. 但这与Neon SIMD指令的寄存器相同。

If you want single-precision, you can use: 如果要单精度，可以使用：

VMUL.F32 s0, s0, s1

If you want double precision, you can use: 如果要双精度，可以使用：

VMUL.F64 d0, d0, d1

Note that the floating-point engine may need to be enabled first if this is not done by the OS. 请注意，如果操作系统未执行此操作，则可能需要首先启用浮点引擎。

关于ARM汇编的浮点指令

问题描述

1 个解决方案

解决方案1
4 2016-07-15 19:29:09

关于ARM汇编的浮点指令

问题描述

1 个解决方案

解决方案1 4 2016-07-15 19:29:09

解决方案1
4 2016-07-15 19:29:09