简体   繁体   English

RISC-V 使用 LUI 和 ADDI 构建 32 位常量

[英]RISC-V build 32-bit constants with LUI and ADDI

LUI (load upper immediate) is used to build 32-bit constants and uses the U-type format. LUI(加载上位立即数)用于构建 32 位常量并使用 U 型格式。 LUI places the U-immediate value in the top 20 bits of the destination register rd, filling in the lowest 12 bits with zeros. LUI 将 U 立即数放在目标寄存器 rd 的前 20 位,用零填充最低的 12 位。

I found this in manual, but if I want to move 0xffffffff to a register, all the code I need is:我在手册中找到了这个,但是如果我想将 0xffffffff 移动到寄存器,我需要的所有代码是:

LUI x2, 0xfffff000
ADDI x2, x2, 0xfff

But a problem occurred, ADDI will extend sign to make a immediate data to a signed number, so 0xfff will be extend to 0xffffffff .但是出现了一个问题,ADDI 会将符号扩展为立即数数据为有符号数,因此0xfff将扩展为0xffffffff

It make x2 to 0xffffefff but not 0xffffffff它使x20xffffefff但不0xffffffff

and what is an good implementation to move a 32bits immediate to register?将 32 位立即移动到寄存器的好方法是什么?

The RISC-V assembler supports the pseudo-instruction li x2, 0xFFFFFFFF . RISC-V 汇编器支持伪指令li x2, 0xFFFFFFFF

Let N is a signed, 2's complement 32 bit integer.N是一个有符号的 2 的补码 32 位整数。

Common case implementation of li x2,N is: li x2,N常见情况实现是:

    # sign extend low 12 bits
    M=(N << 20) >> 20

    # Upper 20 bits
    K=((N-M) >> 12) <<12

    # Load upper 20 bits
    LUI x2,K

    # Add lower bits
    ADDI x2,x2,M

Of course, to load short immediate li can use当然,加载短立即li可以使用

   addi x2,x0,imm

So, li x2, 0xFFFFFFFF is addi x2,x0,-1 .所以, li x2, 0xFFFFFFFFaddi x2,x0,-1

TL;DR: The 32-bit constant you want to load into x2 is 0xffffffff which corresponds to -1 . TL;DR:您要加载到x2的 32 位常量是0xffffffff ,它对应于-1 Since -1 is in the range [-2048, 2047] , this constant can be loaded with a single instruction: addi x2, zero, -1 .由于-1[-2048, 2047]范围内,因此可以使用单个指令加载此常量: addi x2, zero, -1 You can also use the li pseudoinstruction: li, x2, -1 which the assembler, in turn, translates to addi x2, zero, -1 .您还可以使用li伪指令: li, x2, -1 ,汇编程序反过来将其转换为addi x2, zero, -1

Loading a 32-bit constant with a lui + addi sequence使用lui + addi序列加载 32 位常量

In general, we need a lui + addi sequence – two instructions – for loading a 32-bit constant into a register.一般来说,我们需要一个lui + addi序列——两条指令——来将 32 位常量加载到寄存器中。 The lui instruction encodes a 20-bit immediate, whereas the addi instruction encodes a 12-bit immediate. lui指令编码 20 位立即数,而addi指令编码 12 位立即数。 lui and addi can be used to load the upper 20 bits and the lower 12 bits of a 32-bit constant, respectively. luiaddi可用于分别加载 32 位常量的高 20 位和低 12 位。

Let N be a 32-bit constant we want to load into a register: N ≡ n 31 ... n 0 .N是一个我们想要加载到寄存器中的 32 位常量: N ≡ n 31 ... n 0 Then, we can split this constant into its upper 20 bits and lower 12 bits, N U and N L , respectively: N U ≡ n 31 ... n 12 ;然后,我们可以将这个常数分成高 20 位和低 12 位,分别为N UN LN U ≡ n 31 ... n 12 N L ≡ n 11 ... n 0 N L ≡ n 11 ... n 0

In principle, we encode N U in the immediate in lui and N L in the immediate in addi .原则上,我们编码N个U在眼前的luiN L的眼前的addi Nevertheless, there is a difficulty to handle if the most significant bit of the 12-bit immediate in addi is 1 because the immediate value encoded in the addi instruction is sign extended to 32 bits.然而,如果addi 12 位立即数的最高有效位为1 ,则处理起来很困难,因为addi指令中编码的立即数被符号扩展为 32 位。 If this is the case, the addi instruction adds to the destination register not N L , but N L - 4096 instead — -4096 (or -2 12 ) is the resulting number when the upper 20 bits are 1 s and the lower 12 bits are 0 s.如果是这种情况, addi指令将添加到目标寄存器而不是N L ,而是N L - 4096 - -4096 (或 -2 12 )是当高 20 位是1和低 12 位时的结果数是0秒。

To compensate for the unwanted term -4096 , we can add 1 to lui 's immediate – the LSB of the immediate in lui corresponds to bit #12 – so, adding 1 to this immediate results in adding 4096 to the destination register which cancels out the -4096 term.为了补偿不希望长期-4096,我们可以添加1〜 lui的眼前- LSB的在不久的lui对应位#12 -所以,在添加4096到抵消了目标寄存器加1,这个立竿见影的效果-4096项。

Loading a 32-bit constant with a single addi instruction使用单个addi指令加载 32 位常量

The issue explained above is due to the sign extension that the immediate in addi undergoes.上面解释的问题是由于直接 in addi经历的符号扩展。 The decision of sign extending addi 's immediate was probably to allow the loading of small integersintegers between -2048 and 2047 , both inclusive – with a single addi instruction .决定扩展addi的立即数的决定可能是允许加载小整数-介于-20482047之间的整数,两者都包括在内 - 使用单个addi指令 For example, if the immediate in addi were zero extended instead of sign extended, it wouldn't be possible to load such a frequent constant like -1 into a register with just a single instruction.例如,如果addi中的立即数是零扩展而不是符号扩展,则不可能仅用一条指令将像-1这样的频繁常量加载到寄存器中。


Loading a 32-bit constant with the li pseudoinstruction使用li指令加载 32 位常量

In any case, you can always use the li pseudoinstruction for loading a 32-bit constant without having to care about what the value of the constant to load is.在任何情况下,您始终可以使用li伪指令加载 32 位常量,而不必关心要加载的常量的值是什么。 This pseudoinstruction can load any 32-bit number into a register, and it is, therefore, simpler to use and less error-prone than manually writing the lui + addi sequence.该伪指令可以将任何 32 位数字加载到寄存器中,因此它比手动编写lui + addi序列更易于使用且不易出错。

If the number fits in addi 's immediate field ( [-2048, 2047] ), the assembler will translate the li pseudoinstruction into just an addi instruction, otherwise, li will be translated into a lui + addi sequence and the complication explained above is handled automatically by the assembler.如果数字适合addi的立即数字段( [-2048, 2047] ),汇编器会将li伪指令翻译成一条addi指令,否则, li将被翻译成lui + addi序列,上面解释的复杂情况是由汇编程序自动处理。

I was going to say "use ORI instead of ADDI " but then I read the Instruction Set Manual and it turns out that that doesn't work either, because all of the lower-12 Immediate operands get sign-extended, even for logical operations.我本想说“使用ORI而不是ADDI ”,但后来我阅读了指令集手册,结果证明这也不起作用,因为所有低 12 个立即操作数都得到了符号扩展,即使对于逻辑操作也是如此.

AFAICT you have to bias the value you put into the upper 20 bits in a way that anticipates the effect of the instruction you use to set the lower 12 bits. AFAICT 您必须以预期用于设置低 12 位的指令的效果的方式来偏置您放入高 20 位的值。 So if you want to end up with a value X in the top 20 bits and you're going to use ADDI to set the lower 12 bits, and those lower 12 bits have a 1 in the leftmost position, you must do LUI (X+1) rather than LUI X .所以如果你想在前 20 位得到一个值 X 并且你要使用ADDI来设置低 12 位,而那些低 12 位在最左边的位置有一个 1,你必须做LUI (X+1)而不是LUI X Similarly if you are going to use XORI to set the lower 12 bits, and those lower 12 bits have a 1 in the leftmost position, you must do LUI (~X) (that is, the bitwise inverse of X) rather than LUI X .同样,如果你要使用XORI来设置低12位,而那些低12位在最左边的位置是1,你必须做LUI (~X) (即LUI (~X)的按位逆)而不是LUI X .

But before you do any of that, I'd look to see whether your assembler already has some sort of "load immediate" pseudo-op or macro that will take care of this for you.但是在你做任何这些之前,我会看看你的汇编程序是否已经有某种“立即加载”伪操作或宏来为你处理这个问题。 If it doesn't, then see if you can write one :-)如果没有,那么看看你是否可以写一个:-)

It's not unusual for RISC processors to need this kind of extra effort from the programmer (or, more usually, from the compiler). RISC 处理器需要程序员(或者更常见的是编译器)的这种额外努力并不罕见。 The idea is "keep the hardware simple so it can go fast, and it doesn't matter if that makes it harder to construct the software".这个想法是“保持硬件简单,以便它可以快速运行,如果这会使构建软件变得更加困难,这并不重要”。

In practice, just use an li pseudo-instruction that gets the assembler to optimize to one instruction if possible (a single lui or a single addi), and if not does the math for you.在实践中,如果可能的话,只需使用li伪指令让汇编器优化到一条指令(单个 lui 或单个 addi),如果不是,则为您进行数学计算。

   li    t0, 0x12345678
   li    t1, 123
   li    t2, -1
   li    t3, 0xffffffff    # same as -1 in 32-bit 2's complement
   li    t4, 1<<17

I separated each "group" with spaces.我用空格分隔了每个“组”。 Only the first one (into t0 ) needed two instructions.只有第一个(进入t0 )需要两条指令。

$ clang -c -target riscv32 rv.s         # on my x86-64 Arch GNU/Linux desktop
$ llvm-objdump -d rv.o
...
00000000 <.text>:
       0: 01 00         nop
       2: 01 00         nop

       4: b7 52 34 12   lui     t0, 74565
       8: 93 82 82 67   addi    t0, t0, 1656

       c: 13 03 b0 07   addi    t1, zero, 123

      10: fd 53         addi    t2, zero, -1

      12: 7d 5e         addi    t3, zero, -1

      14: b7 0e 02 00   lui     t4, 32

If you do want to do it manually, most assemblers for RISC-V (or at least GAS / clang) have %lo and %hi "macros" so you can lui dst, %hi(value) / addi dst, dst, %lo(value) .如果您确实想手动执行此操作,大多数 RISC-V(或至少 GAS/clang)的汇编程序都有%lo%hi “宏”,因此您可以lui dst, %hi(value) / addi dst, dst, %lo(value)

   lui   x9, %hi(0x12345678)
   addi  x9, x9, %lo(0x12345678)

   lui   x10, %hi(0xFFFFFFFF)
   addi  x10, x10, %lo(0xFFFFFFFF)

assemble with clang, disassemble with llvm-objdump again:用 clang 组装,再次用 llvm-objdump 反汇编:

      18: b7 54 34 12   lui     s1, 74565
      1c: 93 84 84 67   addi    s1, s1, 1656

      20: 37 05 00 00   lui     a0, 0
      24: 7d 15         addi    a0, a0, -1

Note that lui a0, 0 is a silly waste of an instruction that results from naively using hi/lo on 0xffffffff without realizing that the whole thing fits in a sign-extended 12-bit immediate.请注意, lui a0, 0是对一条指令的愚蠢浪费,这是由于天真地在 0xffffffff 上使用 hi/lo 而没有意识到整个事情适合符号扩展的 12 位立即数。


There are good use-cases for manual %hi/%lo, especially for addresses, where you have one aligned "anchor" point and want to load or store to some label after that:手动 %hi/%lo 有很好的用例,特别是对于地址,你有一个对齐的“锚”点,然后想要加载或存储到某个标签:

   lui   t0, %hi(symbol)

   lw    t1, %lo(symbol)(t0)
   lw    t2, %lo(symbol2)(t0)
   addi  t3, t0, %lo(symbol3)   # also put an address in a register
   ...
   sw    t1, %lo(symbol)(t0)

So instead of wasting instructions doing a separate lui for each symbol, if you know they're in the same 2k aligned block you can reference them all relative to one base with the assembler's help.因此,与其浪费指令为每个符号做一个单独的 lui,如果你知道它们在同一个 2k 对齐的块中,你可以在汇编程序的帮助下相对于一个基数引用它们。 Or actually to a 4k aligned block with the "anchor" in the middle, since %lo can be negative.或者实际上是一个 4k 对齐的块,中间有“锚点”,因为%lo可以是负数。

( The PC-relative version of this with auipc is just as efficient but looks a little different: What do %pcrel_hi and %pcrel_lo actually do? - %pcrel_lo actually references a %pcrel_hi relocation to find out the actual target symbol as well as the location of the relative reference.) 使用auipc的 PC 相关版本同样高效,但看起来有些不同: %pcrel_hi 和 %pcrel_lo 实际上是做什么的? - %pcrel_lo 实际上引用了 %pcrel_hi 重定位以找出实际的目标符号以及相对参考的位置。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM