简体   繁体   中英

Have GAS generate instruction from inline assembly?

I'm trying to assemble a file that uses ARM's CRC instruction. The assembler is producing an error Error: selected processor does not support 'crc32b w1,w0,w0' .

There are runtime checks in place, so we are safe with the instruction. The technique works fine on i686 and x86_64. For example, I can assemble a file that uses Intel CRC intrinsics or SHA Intrinsics without -mcrc or -msha (and on a machine without the features).

Here is the test case:

$ cat test.cxx
#include <arm_neon.h>

#define GCC_INLINE_ATTRIB __attribute__((__gnu_inline__, __always_inline__, __artificial__))

#if defined(__GNUC__) && !defined(__ARM_FEATURE_CRC32)
__inline unsigned int GCC_INLINE_ATTRIB
CRC32B(unsigned int crc, unsigned char v)
{
  unsigned int r;
  asm ("crc32b %w2, %w1, %w0" : "=r"(r) : "r"(crc), "r"((unsigned int)v));
  return r;
}
#else
  // Use the intrinsic
# define CRC32B(a,b) __crc32b(a,b)
#endif

int main(int argc, char* argv[])
{
  return CRC32B(argc, argc);
}

And here is the result:

$ g++ test.cxx -c
/tmp/ccqHBPUf.s: Assembler messages:
/tmp/ccqHBPUf.s:23: Error: selected processor does not support `crc32b w1,w0,w0'

Placing the ASM code in a source file and compiling with different options is not feasible because CRC32B will be used in C++ header files, too.

How do I get GAS to assemble the instruction?


GCC's configuration and options are the reason we are trying to do things this way. User's don't read manuals, so they won't add -march=armv8-a+crc+crypto -mtune=cortex-a53 to CFLAGS and CXXFLAGS .

In addition, distros compile to a "least capable" machine, so we want the hardware acceleration routines available. When the library is provided by a distro like Linaro, both code paths (software CRC and hardware accelerated CRC) will be available.


The machine is a LeMaker HiKey, which is ARMv8/Aarch64. It has an A53 processor with CRC and Crypto (CRC and Crypto is optional under the architecture):

$ cat /proc/cpuinfo
Processor       : AArch64 Processor rev 3 (aarch64)
processor       : 0
...
processor       : 7
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: AArch64

GCC lacks most of the usual defines one expects to be present by default:

$ g++ -dM -E - </dev/null | sort | egrep -i '(arm|neon|aarch|asimd)'
#define __aarch64__ 1
#define __AARCH64_CMODEL_SMALL__ 1
#define __AARCH64EL__ 1

Using GCC's -march=native does not work on ARM:

$ g++ -march=native -dM -E - </dev/null | sort | egrep -i '(arm|neon|aarch|asimd)'
cc1: error: unknown value ‘native’ for -march

And Clang:

$ clang++ -dM -E - </dev/null | sort | egrep -i '(arm|neon|aarch|asimd)'
#define __AARCH64EL__ 1
#define __ARM_64BIT_STATE 1
#define __ARM_ACLE 200
#define __ARM_ALIGN_MAX_STACK_PWR 4
#define __ARM_ARCH 8
#define __ARM_ARCH_ISA_A64 1
#define __ARM_ARCH_PROFILE 'A'
#define __ARM_FEATURE_CLZ 1
#define __ARM_FEATURE_DIV 1
#define __ARM_FEATURE_FMA 1
#define __ARM_FEATURE_UNALIGNED 1
#define __ARM_FP 0xe
#define __ARM_FP16_FORMAT_IEEE 1
#define __ARM_FP_FENV_ROUNDING 1
#define __ARM_NEON 1
#define __ARM_NEON_FP 0xe
#define __ARM_PCS_AAPCS64 1
#define __ARM_SIZEOF_MINIMAL_ENUM 4
#define __ARM_SIZEOF_WCHAR_T 4
#define __aarch64__ 1

GCC version:

$ gcc -v
...
gcc version 4.9.2 (Debian/Linaro 4.9.2-10)

GAS version:

$ as -v
GNU assembler version 2.24 (aarch64-linux-gnu) using BFD version (GNU Binutils for Ubuntu) 2.24

This answer came from Jiong Wang on the Binutils mailing list . It bypasses GAS's architectural requirements and plays well with GCC:

__inline unsigned int GCC_INLINE_ATTRIB
CRC32W(unsigned int crc, unsigned int val)
{
#if 1
    volatile unsigned int res;
    asm ("\n"
         "\t" ".set reg_x0, 0\n"
         "\t" ".set reg_x1, 1\n"
         "\t" ".set reg_x2, 2\n"
         "\t" ".set reg_x3, 3\n"
         "\t" ".set reg_x4, 4\n"
         "\t" ".set reg_x5, 5\n"
         "\t" ".set reg_x6, 6\n"
         "\t" ".set reg_x7, 7\n"
         "\t" "#crc32w %w0, %w1, %w2\n"
         "\t" ".inst 0x1ac04800 | (reg_%2 << 16) | (reg_%1 << 5) | (reg_%0)\n"
         : "=r"(res) : "r"(crc), "r"(val)
    );
    return res;
#else
    volatile unsigned int res;
    asm (".cpu generic+fp+simd+crc+crypto  \n"
         "crc32w %w0, %w1, %w2             \n"
         : "=r"(res) : "r"(crc), "r"(val));
    return res;
#endif
}

The second one commented out by the preprocessor block was suggested by Nick Clifton on the Binutils mailing list . The idea is GCC generates code using the ISA based on -march=XXX , so it does not matter if we increase capabilities to get past the assembler. We decided to go with Wang's answer because we did not want potential side effects from modifying the .cpu .

And the verification with GCC 4.8 and Binutils 2.24:

$ g++ -O1 test.cxx -c

$ objdump --disassemble test.o

test.o:     file format elf64-littleaarch64

Disassembly of section .text:

0000000000000000 <main>:
   0:   12001c01        and     w1, w0, #0xff
   4:   1ac14800        crc32w  w0, w0, w1
   8:   d65f03c0        ret

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM