简体   繁体   中英

How can I safely average two unsigned ints in C++?

Using integer math alone, I'd like to "safely" average two unsigned ints in C++.

What I mean by "safely" is avoiding overflows (and anything else that can be thought of).

unsigned int a = 200;
unsigned int b = 5000;
unsigned int average = (a + b) / 2; // Equals: 2600 as intended

unsigned int a = 4294967295;
unsigned int b = 5000;
unsigned int average = (a + b) / 2; // Equals: 2499 instead of 2147486147

unsigned int a = 4294967295;
unsigned int b = 5000;
unsigned int average = (a / 2) + (b / 2); // Equals: 2147486147 as expected

Are there better ways?

Your last approach seems promising. You can improve on that by manually considering the lowest bits of a and b:

unsigned int average = (a / 2) + (b / 2) + (a & b & 1);

This gives the correct results in case both a and b are odd.

If you know ahead of time which one is higher , a very efficient way is possible. Otherwise you're better off using one of the other strategies, instead of conditionally swapping to use this.

unsigned int average = low + ((high - low) / 2);

Here's a related article: http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html

Your method is not correct if both numbers are odd eg 5 and 7, average is 6 but your method #3 returns 5.

Try this:

average = (a>>1) + (b>>1) + (a & b & 1)

with math operators only:

average = a/2 + b/2 + (a%2) * (b%2)

If you don't mind a little x86 inline assembly (GNU C syntax), you can take advantage of supercat's suggestion to use rotate-with-carry after an add to put the high 32 bits of the full 33-bit result into a register.

Of course, you usually should mind using inline-asm, because it defeats some optimizations ( https://gcc.gnu.org/wiki/DontUseInlineAsm ). But here we go anyway:

// works for 64-bit long as well on x86-64, and doesn't depend on calling convention
unsigned average(unsigned x, unsigned y)
{
    unsigned result;
    asm("add   %[x], %[res]\n\t"
        "rcr   %[res]"
        : [res] "=r" (result)   // output
        : [y] "%0"(y),  // input: in the same reg as results output.  Commutative with next operand
          [x] "rme"(x)  // input: reg, mem, or immediate
        :               // no clobbers.  ("cc" is implicit on x86)
    );
    return result;
}

The % modifier to tell the compiler the args are commutative doesn't actually help make better asm in the case I tried, calling the function with y being a constant or pointer-deref (memory operand). Probably using a matching constraint for an output operand defeats that, since you can't use it with read-write operands.

As you can see on the Godbolt compiler explorer , this compiles correctly, and so does a version where we change the operands to unsigned long , with the same inline asm. clang3.9 makes a mess of it, though, and decides to use the "m" option for the "rme" constraint, so it stores to memory and uses a memory operand.


RCR-by-one is not too slow, but it's still 3 uops on Skylake, with 2 cycle latency. It's great on AMD CPUs, where RCR has single-cycle latency. (Source: Agner Fog's instruction tables , see also the tag wiki for x86 performance links). It's still better than @sellibitze's version, but worse than @Sheldon's order-dependent version. (See code on Godbolt)

But remember that inline-asm defeats optimizations like constant-propagation, so any pure-C++ version will be better in that case.

而正确答案是...

(A&B)+((A^B)>>1)

What you have is fine, with the minor detail that it will claim that the average of 3 and 3 is 2. I'm guessing that you don't want that; fortunately, there's an easy fix:

unsigned int average = a/2 + b/2 + (a & b & 1);

This just bumps the average back up in the case that both divisions were truncated.

If the code is for an embedded micro, and if speed is critical, assembly language may be helpful. On many microcontrollers, the result of the add would naturally go into the carry flag, and instructions exist to shift it back into a register. On an ARM, the average operation (source and dest. in registers) could be done in two instructions; any C-language equivalent would likely yield at least 5, and probably a fair bit more than that.

Incidentally, on machines with shorter word sizes, the differences can be even more substantial. On an 8-bit PIC-18 series, averaging two 32-bit numbers would take twelve instructions. Doing the shifts, add, and correction, would take 5 instructions for each shift, eight for the add, and eight for the correction, so 26 (not quite a 2.5x difference, but probably more significant in absolute terms).

In C++20, you can use std::midpoint :

template <class T>
constexpr T midpoint(T a, T b) noexcept;

The paper P0811R3 that introduced std::midpoint recommended this snippet (slightly adopted to work with C++11):

#include <type_traits>

template <typename Integer>
constexpr Integer midpoint(Integer a, Integer b) noexcept {
  using U = std::make_unsigned<Integer>::type;
  return a>b ? a-(U(a)-b)/2 : a+(U(b)-a)/2;
}

For completeness, here is the unmodified C++20 implementation from the paper:

constexpr Integer midpoint(Integer a, Integer b) noexcept {
  using U = make_unsigned_t<Integer>;
  return a>b ? a-(U(a)-b)/2 : a+(U(b)-a)/2;
}
    int[] array = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
    decimal avg = 0;
    for (int i = 0; i < array.Length; i++){
        avg = (array[i] - avg) / (i+1) + avg;
    }

expects avg == 5.0 for this test

(((a&b << 1) + (a^b)) >> 1) is also a nice way.

Courtesy: http://www.ragestorm.net/blogs/?p=29

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM