[英]Why does RISC-V not have an instruction to calculate carry out?
I need to deal with bignum calculation (addition and subtraction, but I treat subtraction as equivalent to signed addition) on RISC-V and the situation is a bit complicated.我需要在 RISC-V 上处理 bignum 计算(加法和减法,但我将减法视为等效于有符号加法),情况有点复杂。 What I gather from half an hour of internet research:
我从半小时的互联网研究中收集到的信息:
bltu
.bltu
完成溢出处理。 As far as I can tell, the branches indeed cover most scenarios rather well, except for one: (signed) bignum addition.据我所知,这些分支确实很好地涵盖了大多数场景,除了一个:(签名)大数加法。 Because there, we hit the slowest check path in a hot loop.
因为在那里,我们在热循环中遇到了最慢的检查路径。
I know only a little about ISA design, but why didn't they include an instruction that calculates (a + b) >> 32
(effectively the carry out)?我对 ISA 设计知之甚少,但为什么它们不包含计算
(a + b) >> 32
的指令(实际上是执行)? A bit like how the multiplication instruction is split into mul
and mulh
as well.有点像乘法指令也被拆分为
mul
和mulh
。 This would allow to do the desired calculation with always two instructions.这将允许始终使用两条指令进行所需的计算。 More powerful micro architectures could then even detect the sequence and only do one addition.
更强大的微架构甚至可以检测到序列并且只进行一次加法。
Am I missing some tricks that would make this instruction obsolete (or be equivalent to it)?我是否遗漏了一些会使该指令过时(或等同于它)的技巧? Does it have any major downsides that I oversee?
它有我监督的任何主要缺点吗? I did not find a lot of good documentation on this general topic.
我没有找到很多关于这个一般主题的好文档。
add
/ sltu
gives you sum and carry-out: https://godbolt.org/z/Y7f5dzj1P shows GCC using it for unsigned math: sum=a+b
/ carry = sum<a
. add
/ sltu
为您提供总和和进位: https://godbolt.org/z/Y7f5dzj1P显示 GCC 将其用于无符号数学: sum=a+b
/ carry = sum<a
。
But the problem with that is lack of ILP: the sltu
can't start until the add
result is ready.但问题在于缺少 ILP:在
add
结果准备好之前, sltu
无法启动。 That could be solved if you could get carry-out directly from the inputs;如果您可以直接从输入中获得结转,则可以解决此问题; good point.
好点子。 Of course fusion of add/sltu would also solve that problem;
当然 add/sltu 的融合也可以解决这个问题; perhaps that's what the architects had in mind.
也许这就是建筑师的想法。
The other major problem for bignum of more than 2 reg-widths is doing add with carry- in (on ISAs with a carry flag and add-with-carry instruction).大于 2 个 reg-width 的 bignum的另一个主要问题是使用进位进行加法(在带有进位标志和加法进位指令的 ISA 上)。 And even worse, getting carry-out from that 3-input addition.
更糟糕的是,从那个 3 输入加法中得到结转。 (Either part of which could wrap, so it's not possible AFAIK to combine it into one add and compare. This is a common pitfall of pure-C implementations of
adc
; comments on that linked answer have working C, but it doesn't compile very efficiently). (其中任何一部分都可以换行,因此 AFAIK 无法将其组合成一个添加和比较。这是
adc
的纯 C 实现的常见缺陷;对该链接答案的评论有效 C,但无法编译非常有效)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.