简体   繁体   English

为什么RISC-V没有指令计算执行?

[英]Why does RISC-V not have an instruction to calculate carry out?

I need to deal with bignum calculation (addition and subtraction, but I treat subtraction as equivalent to signed addition) on RISC-V and the situation is a bit complicated.我需要在 RISC-V 上处理 bignum 计算(加法和减法,但我将减法视为等效于有符号加法),情况有点复杂。 What I gather from half an hour of internet research:我从半小时的互联网研究中收集到的信息:

  • RISC-V operations do not provide means to check for carries or overflow RISC-V 操作不提供检查进位或溢出的方法
  • This decision is motivated in the fact that flags or other means of handling it add a lot of complexity to Out-of-order micro architectures.这个决定的动机是标志或其他处理它的方式给乱序微架构增加了很多复杂性。
  • Instead, they recommend doing branches afterwards相反,他们建议事后做分支
    • For unsigned addition, overflow handling can be done with a single bltu .对于无符号加法,可以使用单个bltu完成溢出处理。
    • Same for signed addition if the sign of one of the operands is known如果其中一个操作数的符号已知,则符号加法相同
    • Otherwise, two checks need to be performed (three additional instructions)否则,需要执行两次检查(三个附加指令)
  • People On The Internet are raging furious about this (I won't link it here)互联网上的人们对此非常愤怒(我不会在这里链接它)

As far as I can tell, the branches indeed cover most scenarios rather well, except for one: (signed) bignum addition.据我所知,这些分支确实很好地涵盖了大多数场景,除了一个:(签名)大数加法。 Because there, we hit the slowest check path in a hot loop.因为在那里,我们在热循环中遇到了最慢的检查路径。

I know only a little about ISA design, but why didn't they include an instruction that calculates (a + b) >> 32 (effectively the carry out)?我对 ISA 设计知之甚少,但为什么它们不包含计算(a + b) >> 32的指令(实际上是执行)? A bit like how the multiplication instruction is split into mul and mulh as well.有点像乘法指令也被拆分为mulmulh This would allow to do the desired calculation with always two instructions.这将允许始终使用两条指令进行所需的计算。 More powerful micro architectures could then even detect the sequence and only do one addition.更强大的微架构甚至可以检测到序列并且只进行一次加法。

Am I missing some tricks that would make this instruction obsolete (or be equivalent to it)?我是否遗漏了一些会使该指令过时(或等同于它)的技巧? Does it have any major downsides that I oversee?它有我监督的任何主要缺点吗? I did not find a lot of good documentation on this general topic.我没有找到很多关于这个一般主题的好文档。

add / sltu gives you sum and carry-out: https://godbolt.org/z/Y7f5dzj1P shows GCC using it for unsigned math: sum=a+b / carry = sum<a . add / sltu为您提供总和和进位: https://godbolt.org/z/Y7f5dzj1P显示 GCC 将其用于无符号数学: sum=a+b / carry = sum<a

But the problem with that is lack of ILP: the sltu can't start until the add result is ready.但问题在于缺少 ILP:在add结果准备好之前, sltu无法启动。 That could be solved if you could get carry-out directly from the inputs;如果您可以直接从输入中获得结转,则可以解决此问题; good point.好点子。 Of course fusion of add/sltu would also solve that problem;当然 add/sltu 的融合也可以解决这个问题; perhaps that's what the architects had in mind.也许这就是建筑师的想法。

The other major problem for bignum of more than 2 reg-widths is doing add with carry- in (on ISAs with a carry flag and add-with-carry instruction).大于 2 个 reg-width 的 bignum另一个主要问题是使用进位进行加法(在带有进位标志和加法进位指令的 ISA 上)。 And even worse, getting carry-out from that 3-input addition.更糟糕的是,从那个 3 输入加法中得到结转。 (Either part of which could wrap, so it's not possible AFAIK to combine it into one add and compare. This is a common pitfall of pure-C implementations of adc ; comments on that linked answer have working C, but it doesn't compile very efficiently). (其中任何一部分都可以换行,因此 AFAIK 无法将其组合成一个添加和比较。这是adc的纯 C 实现的常见缺陷;对该链接答案的评论有效 C,但无法编译非常有效)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM