简体   繁体   English

在编写C代码时,如何优雅地利用REV和RBIT等ARM指令?

[英]How can I elegantly take advantage of ARM instructions like REV and RBIT when writing C code?

I am writing C code which may be compiled for the Arm Cortex-M3 microcontroller.我正在编写 C 代码,可以为 Arm Cortex-M3 微控制器编译。

This microcontroller supports several useful instructions for efficiently manipulating bits in registers, including REV*, RBIT, SXT*.该微控制器支持多种有用的指令,用于有效地操作寄存器中的位,包括 REV*、RBIT、SXT*。

有用的 Cortex-M3 指令

When writing C code, how can I take advantage of these instructions if I need those specific functions?在编写 C 代码时,如果我需要那些特定的功能,我该如何利用这些指令? For example, how can I complete this code?例如,我怎样才能完成这段代码?

#define REVERSE_BIT_ORDER(x)    {  /* what to write here? */ }

I would like to do this without using inline assembler so that this code is both portable, and readable.我想在不使用内联汇编程序的情况下执行此操作,以便此代码既可移植又可读。

Added:添加:

In part, I am asking how to express such a function in C elegantly.在某种程度上,我在问如何在 C 中优雅地表达这样一个 function 。 For example, it's easy to express bit shifting in C, because it's built into the language.例如,很容易在 C 中表达位移位,因为它内置于语言中。 Likewise, setting or clearing bits.同样,设置或清除位。 But bit reversal is unknown in C, and so is very hard to express.但是 C 中的位反转是未知的,所以很难表达。 For example, this is how I would reverse bits:例如,这就是我反转位的方式:

unsigned int ReverseBits(unsigned int x)
{
    unsigned int ret = 0;
    for (int i=0; i<32; i++)
    {
        ret <<= 1;
        if (x & (1<<i))
            ret |= 1;
    }
    return ret;
}

Would the compiler recognise this as bit reversal, and issue the correct instruction?编译器是否会将此识别为位反转,并发出正确的指令?

Reversing bits in a 32 bit integer is such an exotic instruction so that might be why you can't reproduce it.反转 32 位 integer 中的位是一种奇特的指令,所以这可能就是您无法重现它的原因。 I was able to generate code that utilizes REV (reverse byte order) however, which is a far more common use-case:然而,我能够生成利用REV (反向字节顺序)的代码,这是一个更常见的用例:

#include <stdint.h>

uint32_t endianize (uint32_t input)
{
  return ((input >> 24) & 0x000000FF) |
         ((input >>  8) & 0x0000FF00) |
         ((input <<  8) & 0x00FF0000) |
         ((input << 24) & 0xFF000000) ;
}

With gcc -O3 -mcpu=cortex-m3 -ffreestanding (for ARM32, vers 11.2.1 "none"):使用gcc -O3 -mcpu=cortex-m3 -ffreestanding (对于 ARM32,版本 11.2.1“无”):

endianize:
        rev     r0, r0
        bx      lr

https://godbolt.org/z/odGqzjTGz https://godbolt.org/z/odGqzjTGz

It works for clang armv7-a 15.0.0 too, long as you use -mcpu=cortex-m3 .它也适用于 clang armv7-a 15.0.0,只要您使用-mcpu=cortex-m3即可。

So this would support the idea of avoiding manual optimizations and let the compiler worry about such.因此,这将支持避免手动优化并让编译器担心这些的想法。

It would be best if you used CMSIS intrinsic.最好使用 CMSIS intrinsic。

__REV, __REV16 etc. Those CMSIS header files contain much much more. __REV、__REV16 等。这些 CMSIS header 文件包含更多内容。

You can get them from here:你可以从这里得到它们:

https://github.com/ARM-software/CMSIS_5 https://github.com/ARM-software/CMSIS_5

and you are looking for cmsis_gcc.h file (or similar if you use another compiler).并且您正在寻找cmsis_gcc.h文件(如果您使用其他编译器,则为类似文件)。

@Lundin's answer shows a pure-C shift/mask bithack that clang recognizes and compiles to a single rev instruction. @Lundin 的回答显示了 clang 识别并编译为单个rev指令的纯 C shift/mask bithack。 (Or presumably to x86 bswap if targeting x86, or equivalent instructions on other ISAs that have them.) (或者如果目标是 x86,则可能是 x86 bswap ,或者其他具有它们的 ISA 上的等效指令。)

In portable ISO C, hoping for pattern-recognition is unfortunately the best you can do, because they haven't added portable ways to expose CPU functionality;在可移植的 ISO C 中,不幸的是希望模式识别是你能做的最好的,因为他们没有添加可移植的方式来暴露 CPU 功能; even C++ took until C++20 to add the <bit> header for things like std::popcount and C++23 std::byteswap .甚至 C++ 直到 C++20 才为std::popcountC++23 std::byteswap添加<bit> header。

(Some fairly-portable C libraries / headers have byte-reversal, eg as part of.networking there's ntohl .net-to-host which is an endian-swap on little-endian machines. Or there's GCC's (or glibc's?) endian.h , with htobe32 being host to big-endian 32-bit. Man page . These are usually implemented with intrinsics that compile to a single instruction in good-quality implementations. (一些相当便携的 C 库/标头具有字节反转,例如,作为 .networking 的一部分,有ntohl .net-to-host,它是小端机器上的端交换。或者有 GCC(或 glibc 的?) endian.h , htobe32是 big-endian 32-bit 的宿主. Man page . 这些通常是用内部函数实现的,这些内部函数在高质量的实现中编译为单个指令。

Of course, if you definitely want a byte swap regardless of host endianness, you could do htole32(be32toh(x)) because one of them's a no-op and the other's a byte-swap, since ARM is either big or little endian.当然,如果您确实想要字节交换而不管主机端序如何,您可以执行htole32(be32toh(x))因为其中一个是空操作而另一个是字节交换,因为 ARM 要么是大端,要么是小端。 (It's still a byte-swap even if neither of them are NOPs, even on PDP or other mixed-endian machines, but there might be more efficient ways to do it.) (即使它们都不是 NOP,它仍然是字节交换,即使在 PDP 或其他混合端机器上也是如此,但可能有更有效的方法来做到这一点。)

There are also some "collections of useful functions" headers with intrinsics for different compilers, with functions like byte swap.还有一些“有用函数的集合”标头,其中包含适用于不同编译器的内在函数,以及字节交换等功能。 These can be of varying quality in terms of efficiency and maybe even correctness.这些在效率甚至正确性方面可能具有不同的质量。


You can see that no, neither GCC nor clang optimize your code to rbit for ARM or AArch64.您可以看到,不,GCC 和 clang 都没有将您的代码优化为rbit或 AArch64 的 rbit。 https://godbolt.org/z/Y7noP61dE . https://godbolt.org/z/Y7noP61dE Presumably looping over bits in the other direction isn't any better.大概在另一个方向上循环比特也好不到哪里去。 Perhaps a bithack as in In C/C++ what's the simplest way to reverse the order of bits in a byte?也许像在 C/C++ 中那样的 bithack 什么是反转字节中位顺序的最简单方法? or Efficient Algorithm for Bit Reversal (from MSB->LSB to LSB->MSB) in C .C 中的有效位反转算法(从 MSB->LSB 到 LSB->MSB)

CC and clang recognize the standard bithack for popcount, but I didn't check any of the answers on the bit-reverse questions. CC 和 clang 识别 popcount 的标准 bithack,但我没有检查任何关于位反转问题的答案。


Some languages, notably Rust, do care more about making it possible to portably express what modern CPUs can do.一些语言,尤其是 Rust,确实更关心是否可以便携地表达现代 CPU 可以做什么。 foo.reverse_bits() (since Rust 1.37) and foo.swap_bytes() just work for any type on any ISA. foo.reverse_bits() (自 Rust 1.37 起)和foo.swap_bytes()仅适用于任何 ISA 上的任何类型。 For u32 specifically, https://doc.rust-lang.org/std/primitive.u32.html#method.reverse_bits (That's Rust's equivalent of C uint32_t .)具体对于u32 ://doc.rust-lang.org/std/primitive.u32.html#method.reverse_bits (这相当于 Rust 的 C uint32_t 。)


Most mainstream C implementations have portable (across ISAs) builtins or (target-specific) intrinsics (like __REV() or __REV16() for stuff like this.大多数主流 C 实现都具有可移植的(跨 ISA)内置函数或(特定于目标的)内在函数(如__REV()__REV16()类的东西。

The GNU dialect of C (GCC/clang/ICC and some others) includes __builtin_bswap32(input) . C(GCC/clang/ICC 和其他一些)的 GNU 方言包括__builtin_bswap32(input) See Does ARM GCC have a builtin function for the assembly 'REV' instruction?请参阅ARM GCC 是否具有用于汇编“REV”指令的内置 function? . . It's named after the x86 bswap instruction, but it's just a byte-reverse that GCC / clang compile to whatever instructions can do it efficiently on the target ISA.它以 x86 bswap指令命名,但它只是一个字节反转,GCC / clang 编译成可以在目标 ISA 上有效执行的任何指令。

There's also a __builtin_bswap16(uint16_t) for swapping the bytes of a 16-bit integer, like revsh except the C semantics don't include preserving the upper 16 bits of a 32-bit integer. (Because normally you don't care about that part.) See the GCC manual for the available GNU C builtins that aren't target-specific.还有一个__builtin_bswap16(uint16_t)用于交换 16 位 integer 的字节,就像revsh除了 C 语义不包括保留 32 位 integer 的高 16 位。(因为通常你不关心那个部分。)请参阅GCC 手册,了解非目标特定的可用 GNU C 内置函数。

There isn't a GNU C builtin or intrinsic for bitwise reverse that I could find in the manual or GCC arm-none-eabi 12.2 headers.我无法在手册或 GCC arm-none-eabi 12.2 标头中找到按位反转的 GNU C 内置或内在函数。


ARM documents an __rbit() intrinsic for their own compiler, but I think that's Keil's ARMCC, so there might not be any equivalent of that for GCC/clang. ARM 为他们自己的编译器记录了一个__rbit()内在函数,但我认为那是 Keil 的 ARMCC,因此对于 GCC/clang 可能没有任何等效项。

@0___________ suggests https://github.com/ARM-software/CMSIS_5 for headers that define a function for that. @0___________ 建议https://github.com/ARM-software/CMSIS_5用于定义 function 的标头。


If worst comes to worst, GNU C inline asm is possible for GCC/clang, given appropriate #ifdef s.如果最坏的情况发生,GNU C 内联asm对于 GCC/clang 是可能的,给定适当的#ifdef s。 You might also want if (__builtin_constant_p(x)) to use a pure-C bit-reversal so constant-propagation can happen on compile-time constants, only using inline asm on runtime-variable values.您可能还希望if (__builtin_constant_p(x))使用纯 C 位反转,以便常量传播可以发生在编译时常量上,仅在运行时变量值上使用内联 asm。

   uint32_t output, input=...;
#if defined(__arm__) || defined (__aarch64__)
   // same instruction is valid for both
   asm("rbit %0,%1" : "=r"(output) : "r"(input));
#else 
 ...  // pure C fallback or something
#endif

Note that it doesn't need to be volatile because rbit is a pure function of the input operand.请注意,它不需要是volatile的,因为rbit是输入操作数的纯 function。 It's a good thing if GCC/clang are able to hoist this out of a loop.如果 GCC/clang 能够将其提升到循环之外,这是一件好事。 And it's a single asm instruction so we don't need an early-clobber.而且它是一个单一的 asm 指令,所以我们不需要早期的破坏。

This has the downside that the compiler can't fold a shift into it, eg if you wanted a byte-reverse, __rbit(x) >> 24 equals __rbit(x<<24) , which could be done with rbit r0, r1, lsl #24 .这样做的缺点是编译器无法将移位折叠到其中,例如,如果您想要字节反转, __rbit(x) >> 24等于__rbit(x<<24) ,这可以用rbit r0, r1, lsl #24完成rbit r0, r1, lsl #24 (I think). (我认为)。

With inline asm I don't think there's a way to tell the compiler that a r1, lsl #24 is a valid expansion for the %1 input operand.对于内联汇编,我认为没有办法告诉编译器r1, lsl #24%1输入操作数的有效扩展。 Hmm, unless there's a machine-specific constraint for that?嗯,除非有特定于机器的约束? https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html - no, no mention of "shifted" or "flexible" source operand in the ARM section. https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html - 不,ARM 部分中没有提及“移位”或“灵活”源操作数。

Efficient Algorithm for Bit Reversal (from MSB->LSB to LSB->MSB) in C shows an #ifdef ed version with a working fallback that uses a bithack to reverse bits within a byte, then __builtin_bswap32 or MSVC _byteswap_ulong to reverse bytes. C 中的Efficient Algorithm for Bit Reversal (from MSB->LSB to LSB->MSB)显示了一个#ifdef ed 版本,它使用一个 bithack 来反转一个字节中的位,然后使用__builtin_bswap32或 MSVC _byteswap_ulong来反转字节。

Interestingly, ARM gcc seems to have improved its detection of byte order reversing recently.有趣的是,ARM gcc 最近似乎改进了对字节顺序反转的检测。 With version 11, it would detect byte reversal if done by bit shifting, or by byte swapping through a pointer.在版本 11 中,如果通过位移位或通过指针进行字节交换,它将检测到字节反转。 However, from version 10 and backwards, the pointer method failed to issue the REV instruction.但是,从版本 10 开始,指针方法无法发出REV指令。

    uint32_t endianize1 (uint32_t input)
    {
      return ((input >> 24) & 0x000000FF) |
             ((input >>  8) & 0x0000FF00) |
             ((input <<  8) & 0x00FF0000) |
             ((input << 24) & 0xFF000000) ;
    }

    uint32_t endianize2 (uint32_t input)
    {
        uint32_t output;
        uint8_t *in8  = (uint8_t*)&input;
        uint8_t *out8 = (uint8_t*)&output;

        out8[0] = in8[3];
        out8[1] = in8[2];
        out8[2] = in8[1];
        out8[3] = in8[0];
        return output;
    }
endianize1:
    rev     r0, r0
    bx      lr
endianize2:
    mov     r3, r0
    movs    r0, #0
    lsrs    r2, r3, #24
    bfi     r0, r2, #0, #8
    ubfx    r2, r3, #16, #8
    bfi     r0, r2, #8, #8
    ubfx    r2, r3, #8, #8
    bfi     r0, r2, #16, #8
    bfi     r0, r3, #24, #8
    bx      lr

https://godbolt.org/z/E3xGvG9qq https://godbolt.org/z/E3xGvG9qq

So, as we wait for optimisers to improve, there are certainly ways you can help the compiler understand your intent and take good advantage of the instruction set (without resorting to micro optimisations or inline assembler).因此,当我们等待优化器改进时,当然有一些方法可以帮助编译器理解您的意图并充分利用指令集(无需求助于微优化或内联汇编器)。 But it's likely that this will involve a good understanding of the architecture by the programmer, and examination of the output assembler.但这很可能需要程序员很好地理解体系结构,并检查 output 汇编器。

Take advantage of http://godbolt.org to help examine the compiler output, and see what produces the best output.利用http://godbolt.org帮助检查编译器 output,看看什么产生最好的 output。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在编写干净的C代码时,充分利用ARM未对齐的内存访问 - Take advantage of ARM unaligned memory access while writing clean C code 编写ARM机器指令并从C执行它们(在Raspberry pi上) - Writing ARM machine instructions and executing them from C (On the Raspberry pi) ARM对C语句的指令 - ARM instructions to C statements 如何使编译器选择更新标志的ARM指令? - How can I make a compiler choose flag-updating ARM instructions? 在C中使用两个fork()时如何利用每个进程 - How to take advantage of each process when using two fork()'s in C 如何根据这些说明加载一副纸牌? C - How can I load a deck of cards based on these instructions? C 如何在 C 代码中调用用 ARM 汇编编写的函数? - How can I call a function written in ARM assembly within C code? 有什么设计工具可以让我确切地获得C代码中不同功能的指令数量 - Is there any design tools that I can get the exact number of instructions of different functions in C code 当我运行我的代码时,如何查看我在 Netbeans 中写的内容? - How can I see what I am writing in Netbeans when I run my code? 如何调整代码以利用 2013 Mac 上的 GPU? - How can I adjust the code so it takes advantage of a GPU on an 2013 Mac?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM