简体   繁体   English

如何在 C++ 中安全地平均两个无符号整数?

[英]How can I safely average two unsigned ints in C++?

Using integer math alone, I'd like to "safely" average two unsigned ints in C++.单独使用整数数学,我想在 C++ 中“安全地”平均两个无符号整数。

What I mean by "safely" is avoiding overflows (and anything else that can be thought of).我所说的“安全”是指避免溢出(以及任何其他可以想到的东西)。

For instance, averaging 200 and 5000 is easy:例如,平均2005000很容易:

unsigned int a = 200;
unsigned int b = 5000;
unsigned int average = (a + b) / 2; // Equals: 2600 as intended

But in the case of 4294967295 and 5000 then:但在42949672955000的情况下:

unsigned int a = 4294967295;
unsigned int b = 5000;
unsigned int average = (a + b) / 2; // Equals: 2499 instead of 2147486147

The best I've come up with is:我想出的最好的是:

unsigned int a = 4294967295;
unsigned int b = 5000;
unsigned int average = (a / 2) + (b / 2); // Equals: 2147486147 as expected

Are there better ways?有更好的方法吗?

Your last approach seems promising.您的最后一种方法似乎很有希望。 You can improve on that by manually considering the lowest bits of a and b:您可以通过手动考虑 a 和 b 的最低位来改进它:

unsigned int average = (a / 2) + (b / 2) + (a & b & 1);

This gives the correct results in case both a and b are odd.在 a 和 b 都是奇数的情况下,这给出了正确的结果。

If you know ahead of time which one is higher , a very efficient way is possible.如果你提前知道哪个更高,那么一种非常有效的方法是可能的。 Otherwise you're better off using one of the other strategies, instead of conditionally swapping to use this.否则,您最好使用其他策略之一,而不是有条件地交换使用它。

unsigned int average = low + ((high - low) / 2);

Here's a related article: http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html这是一篇相关文章: http : //googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html

Your method is not correct if both numbers are odd eg 5 and 7, average is 6 but your method #3 returns 5.如果两个数字都是奇数,例如 5 和 7,则您的方法不正确,平均值为 6,但您的方法 #3 返回 5。

Try this:试试这个:

average = (a>>1) + (b>>1) + (a & b & 1)

with math operators only:仅使用数学运算符:

average = a/2 + b/2 + (a%2) * (b%2)

If you don't mind a little x86 inline assembly (GNU C syntax), you can take advantage of supercat's suggestion to use rotate-with-carry after an add to put the high 32 bits of the full 33-bit result into a register.如果您不介意一点 x86 内联汇编(GNU C 语法),您可以利用 supercat 的建议,在添加后使用rotate-with-carry将完整 33 位结果的高 32 位放入寄存器.

Of course, you usually should mind using inline-asm, because it defeats some optimizations ( https://gcc.gnu.org/wiki/DontUseInlineAsm ).当然,您通常应该介意使用 inline-asm,因为它会破坏一些优化( https://gcc.gnu.org/wiki/DontUseInlineAsm )。 But here we go anyway:但无论如何我们都要去:

// works for 64-bit long as well on x86-64, and doesn't depend on calling convention
unsigned average(unsigned x, unsigned y)
{
    unsigned result;
    asm("add   %[x], %[res]\n\t"
        "rcr   %[res]"
        : [res] "=r" (result)   // output
        : [y] "%0"(y),  // input: in the same reg as results output.  Commutative with next operand
          [x] "rme"(x)  // input: reg, mem, or immediate
        :               // no clobbers.  ("cc" is implicit on x86)
    );
    return result;
}

The % modifier to tell the compiler the args are commutative doesn't actually help make better asm in the case I tried, calling the function with y being a constant or pointer-deref (memory operand). %修饰符告诉编译器 args 是可交换的,在我尝试过的情况下,实际上并没有帮助改进 asm,调用函数时 y 是常量或指针取消引用(内存操作数)。 Probably using a matching constraint for an output operand defeats that, since you can't use it with read-write operands.可能对输出操作数使用匹配约束会失败,因为您不能将它与读写操作数一起使用。

As you can see on the Godbolt compiler explorer , this compiles correctly, and so does a version where we change the operands to unsigned long , with the same inline asm.正如您在 Godbolt 编译器资源管理器上看到,这可以正确编译,我们将操作数更改为unsigned long的版本也是如此,具有相同的内联 asm。 clang3.9 makes a mess of it, though, and decides to use the "m" option for the "rme" constraint, so it stores to memory and uses a memory operand.但是,clang3.9 把它弄得一团糟,并决定对"rme"约束使用"m"选项,因此它存储到内存并使用内存操作数。


RCR-by-one is not too slow, but it's still 3 uops on Skylake, with 2 cycle latency. RCR-by-one 并不太慢,但在 Skylake 上仍然是 3 uop,有 2 个周期的延迟。 It's great on AMD CPUs, where RCR has single-cycle latency.它非常适合 AMD CPU,其中 RCR 具有单周期延迟。 (Source: Agner Fog's instruction tables , see also the tag wiki for x86 performance links). (来源: Agner Fog 的指令表,另请参阅标签 wiki 以获取 x86 性能链接)。 It's still better than @sellibitze's version, but worse than @Sheldon's order-dependent version.它仍然比@sellibitze 的版本好,但比@Sheldon 的依赖订单的版本差。 (See code on Godbolt) (参见 Godbolt 上的代码)

But remember that inline-asm defeats optimizations like constant-propagation, so any pure-C++ version will be better in that case.但请记住,inline-asm 会击败诸如常量传播之类的优化,因此在这种情况下,任何纯 C++ 版本都会更好。

而正确答案是...

(A&B)+((A^B)>>1)

What you have is fine, with the minor detail that it will claim that the average of 3 and 3 is 2. I'm guessing that you don't want that;你所拥有的很好,有一个小细节,它会声称 3 和 3 的平均值是 2。我猜你不想要那样; fortunately, there's an easy fix:幸运的是,有一个简单的解决方法:

unsigned int average = a/2 + b/2 + (a & b & 1);

This just bumps the average back up in the case that both divisions were truncated.在两个部门都被截断的情况下,这只会使平均值上升。

If the code is for an embedded micro, and if speed is critical, assembly language may be helpful.如果代码用于嵌入式微,并且速度至关重要,则汇编语言可能会有所帮助。 On many microcontrollers, the result of the add would naturally go into the carry flag, and instructions exist to shift it back into a register.在许多微控制器上,加法的结果自然会进入进位标志,并且存在将其移回寄存器的指令。 On an ARM, the average operation (source and dest. in registers) could be done in two instructions;在 ARM 上,平均操作(寄存器中的源和目标)可以在两条指令中完成; any C-language equivalent would likely yield at least 5, and probably a fair bit more than that.任何等效的 C 语言都可能产生至少 5 个,并且可能比这多一点。

Incidentally, on machines with shorter word sizes, the differences can be even more substantial.顺便说一句,在字长较短的机器上,差异可能更大。 On an 8-bit PIC-18 series, averaging two 32-bit numbers would take twelve instructions.在 8 位 PIC-18 系列上,平均两个 32 位数字需要 12 条指令。 Doing the shifts, add, and correction, would take 5 instructions for each shift, eight for the add, and eight for the correction, so 26 (not quite a 2.5x difference, but probably more significant in absolute terms).进行移位、加法和校正,每个移位需要 5 条指令,加法 8 条,校正 8 条,所以 26(不是 2.5 倍的差异,但绝对值可能更重要)。

In C++20, you can use std::midpoint :在 C++20 中,您可以使用std::midpoint

template <class T>
constexpr T midpoint(T a, T b) noexcept;

The paper P0811R3 that introduced std::midpoint recommended this snippet (slightly adopted to work with C++11):介绍std::midpoint的论文P0811R3推荐了这个片段(稍微采用了 C++11):

#include <type_traits>

template <typename Integer>
constexpr Integer midpoint(Integer a, Integer b) noexcept {
  using U = std::make_unsigned<Integer>::type;
  return a>b ? a-(U(a)-b)/2 : a+(U(b)-a)/2;
}

For completeness, here is the unmodified C++20 implementation from the paper:为了完整起见,这里是论文中未修改的 C++20 实现:

constexpr Integer midpoint(Integer a, Integer b) noexcept {
  using U = make_unsigned_t<Integer>;
  return a>b ? a-(U(a)-b)/2 : a+(U(b)-a)/2;
}
    int[] array = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
    decimal avg = 0;
    for (int i = 0; i < array.Length; i++){
        avg = (array[i] - avg) / (i+1) + avg;
    }

expects avg == 5.0 for this test预计此测试的 avg == 5.0

(((a&b << 1) + (a^b)) >> 1) is also a nice way. (((a&b << 1) + (a^b)) >> 1)也是一个不错的方法。

Courtesy: http://www.ragestorm.net/blogs/?p=29礼貌: http ://www.ragestorm.net/blogs/?p= 29

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM