简体   繁体   English

ARM汇编:重新加载“ asm”时无法在类“ GENERAL_REGS”中找到寄存器

[英]ARM assembly: can’t find a register in class ‘GENERAL_REGS’ while reloading ‘asm’

I am trying to implement a function which multiplies 32-bit operand with 256-bit operand in ARM assembly on ARM Cortex-a8. 我试图在ARM Cortex-a8上的ARM汇编中实现将32位操作数与256位操作数相乘的函数。 The problem is I am running out of registers and I have no idea how I can reduce the number of used registers here. 问题是我的寄存器用完了,我不知道如何减少这里使用的寄存器数量。 Here is my function: 这是我的功能:

typedef struct UN_256fe{

uint32_t uint32[8];

}UN_256fe;

typedef struct UN_288bite{

uint32_t uint32[9];

}UN_288bite;
void multiply32x256(uint32_t A, UN_256fe* B, UN_288bite* res){

asm (

        "umull          r3, r4, %9, %10;\n\t"
        "mov            %0, r3;         \n\t"/*res->uint32[0] = r3*/
        "umull          r3, r5, %9, %11;\n\t"
        "adds           r6, r3, r4;     \n\t"/*res->uint32[1] = r3 + r4*/
        "mov            %1, r6;         \n\t"
        "umull          r3, r4, %9, %12;\n\t"
        "adcs           r6, r5, r3;     \n\t"
        "mov            %2, r6;         \n\t"/*res->uint32[2] = r6*/
        "umull          r3, r5, %9, %13;\n\t"
        "adcs           r6, r3, r4;     \n\t"
        "mov            %3, r6;         \n\t"/*res->uint32[3] = r6*/
        "umull          r3, r4, %9, %14;\n\t"
        "adcs           r6, r3, r5;     \n\t"
        "mov            %4, r6;         \n\t"/*res->uint32[4] = r6*/
        "umull          r3, r5, %9, %15;\n\t"
        "adcs           r6, r3, r4;     \n\t"
        "mov            %5, r6;         \n\t"/*res->uint32[5] = r6*/
        "umull          r3, r4, %9, %16;\n\t"
        "adcs           r6, r3, r5;     \n\t"
        "mov            %6, r6;         \n\t"/*res->uint32[6] = r6*/
        "umull          r3, r5, %9, %17;\n\t"
        "adcs           r6, r3, r4;     \n\t"
        "mov            %7, r6;         \n\t"/*res->uint32[7] = r6*/
        "adc            r6, r5, #0 ;    \n\t"
        "mov            %8, r6;         \n\t"/*res->uint32[8] = r6*/

        : "=r"(res->uint32[8]), "=r"(res->uint32[7]), "=r"(res->uint32[6]), "=r"(res->uint32[5]), "=r"(res->uint32[4]),
           "=r"(res->uint32[3]), "=r"(res->uint32[2]), "=r"(res->uint32[1]), "=r"(res->uint32[0])
         : "r"(A), "r"(B->uint32[7]), "r"(B->uint32[6]), "r"(B->uint32[5]),
           "r"(B->uint32[4]), "r"(B->uint32[3]), "r"(B->uint32[2]), "r"(B->uint32[1]), "r"(B->uint32[0]), "r"(temp)
         : "r3", "r4", "r5", "r6", "cc", "memory");

}

EDIT-1: I updated my clobber list based on the first comment, but I still get the same error EDIT-1:我根据第一条评论更新了我的内容清单,但仍然收到相同的错误

A simple solution is to break this up and don't use 'clobber'. 一个简单的解决方案是将其分解,而不使用“ Clobber”。 Declare the variables as 'tmp1', etc. Try not to use any mov statements; 将变量声明为“ tmp1”,等等。请尽量不要使用任何mov语句; let the compiler do this if it has to. 如果需要,让编译器执行此操作。 The compiler will use an algorithm to figure out the best 'flow' of information. 编译器将使用一种算法来找出最佳的信息“流”。 If you use 'clobber', it can not reuse registers. 如果您使用'clobber',则它无法重用寄存器。 They way it is now, you make it load all the memory first before the assembler executes. 按照现在的方式,您可以在汇编程序执行之前先加载所有内存。 This is bad as you want memory/CPU ALU to pipeline. 这很不好,因为您希望将内存/ CPU ALU流水线化。

void multiply32x256(uint32_t A, UN_256fe* B, UN_288bite* res) 
{

  uint32_t mulhi1, mullo1;
  uint32_t mulhi2, mullo2;
  uint32_t tmp;

  asm("umull          %0, %1, %2, %3;\n\t"
       : "=r" (mullo1), "=r" (mulhi1)
       : "r"(A), "r"(B->uint32[7])
  );
  res->uint32[8] = mullo1; /* was 'mov %0, r3; */
  volatile asm("umull          %0, %1, %3, %4;\n\t"
      "adds           %2, %5, %6;     \n\t"/*res->uint32[1] = r3 + r4*/
     : "=r" (mullo2), "=r" (mulhi2), "=r" (tmp)
     : "r"(A), "r"(B->uint32[6]), "r" (mullo1), "r"(mulhi1)
     : "cc"
    );
  res->uint32[7] = tmp; /* was 'mov %1, r6; */
  /* ... etc */
}

The whole purpose of the 'gcc inline assembler' is not to code assembler directly in a 'C' file. “ gcc内联汇编程序”的全部目的不是直接在“ C”文件中编码汇编程序。 It is to use the register allocation logic of the compiler AND do something that can not be easily done in 'C'. 它是使用编译器的寄存器分配逻辑做一些不能在“C”可以轻松完成。 The use of carry logic in your case. 在您的情况下使用进位逻辑。

By not making it one huge 'asm' clause, the compiler can schedule the loads from memory as it needs new registers. 通过使其不成为一个巨大的“ asm”子句,编译器可以在需要新寄存器时从内存中调度加载。 It will also pipeline your 'UMULL' ALU activity with the load/store unit. 它还会将您的“ UMULL” ALU活动与加载/存储单元进行管道传输。

You should only use clobber if an instruction implicitly clobbers a specific register. 仅当指令隐式破坏特定寄存器时,才应使用Clobber。 You may also use something like, 您也可以使用类似的方法,

register int *p1 asm ("r0");

and use that as an output. 并将其用作输出。 However, I don't know of any ARM instructions like this besides those that might alter the stack and your code doesn't use these and the carry of course. 但是,除了那些可能改变堆栈的指令之外,我不知道像这样的任何ARM指令,并且您的代码当然不会使用这些指令和进位指令。

GCC knows that memory changes if it is listed as an input/output, so you don't need a memory clobber. GCC知道如果内存被列为输入/输出,它就会发生变化,因此您不需要内存破坏者。 In fact it is detrimental as the memory clobber is a compiler memory barrier and this will cause memory to be written when the compiler might be able to schedule that for latter. 实际上,这是有害的,因为内存破坏者是编译器的内存屏障 ,当编译器可能为后者安排内存时 ,这将导致写入内存。


The moral is use gcc inline assembler to work with the compiler. 道德是使用gcc内联汇编器与编译器一起工作。 If you code in assembler and you have huge routines, the register use can become complex and confusing. 如果您在汇编器中编码并且拥有大量例程,则寄存器的使用可能会变得复杂而混乱。 Typical assembler coders will keep only one thing in a register per routine, but that is not always the best use of registers. 典型的汇编编码器每个例程在寄存器中仅保留一件事,但这并不总是寄存器的最佳用途。 The compiler will shuffle the data around in a fairly smart way that is difficult to beat (and not very satisfying to hand code IMO) when the code size gets larger. 编译器将以相当智能的方式对数据进行混洗,当代码变大时,这种方式很难克服(并且对于手动编写IMO不太满意)。

You might want to look at the GMP library which has lots of ways to efficiently tackle some of the same issues it looks like your code has. 您可能想看一下GMP库 ,它具有许多方法来有效解决代码中看起来相同的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何解决C中内联汇编中的错误:“在重新加载“ asm”时找不到类“ GENERAL_REGS”中的寄存器” - how to solve error in inline assembly in C: 'can't find a register in class 'GENERAL_REGS' while reloading 'asm'' 为什么Visual Studio可以识别__asm {}但不能识别汇编代码? - Why visual studio recognizes __asm {} but can't recognize assembly code? 如何在ASM内联语句中请求通用寄存器? - How to ask for a general purpose register in an asm inline statement? C 代码和 ARM Cortex 架构的扩展 ASM 中的内联汇编语句 - Inline assembly statements in C code and extended ASM for ARM Cortex architectures arm 内联汇编 - 将 C 变量存储在 arm 寄存器中 - arm inline assembly - store C variable in arm register 内联汇编错误:'asm'中未知的寄存器名称'%% ebx' - Inline assembly error: unknown register name ‘%%ebx’ in ‘asm’ 汇编,无法将堆栈中的数组值添加到寄存器 - Assembly, can't add array values from stack to a register C和程序集__asm不起作用 - C and assembly __asm doesn't work 调试时找不到“ arm-elf-g ++” - Couldn't find “arm-elf-g++” while debugging C 函数调用的 ARM 汇编函数中的寄存器使用 - Register usage in ARM assembly function which is called by a C function
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM