简体   繁体   中英

What implementation of average is the most accurate?

Given those two implementations of the average function:

float average(const vector<float>& seq)
{
  float sum = 0.0f;

  for (auto&& value : seq)
  {
    sum += value;
  }

  return sum / seq.size();
}

And:

float average(const vector<float>& seq)
{
  float avg = 0.0f;

  for (auto&& value : seq)
  {
    avg += value / seq.size();
  }

  return avg;
}

To illustrate my question, imagine we have a huge difference in the input data, like so:

1.0f, 0.0f, 0.0f, 0.0f, 1000000.0f

My guess is that in the first implementation, sum can grow "too much" and loose the least significant digits and be 1000000.0f instead of 1000001.0f at the end of the sum loop.

On the other hand, the second implementation seems theorically less efficient, due to the number of divisions to perform (I haven't profiled anything, this is a blind guess).

So, is one of these implementation preferable to the other ? Am I true that the first implementation is less accurate ?

I wouldn't count on the second being more accurate. The differences in the size of the elements are divided by the length of the vector, but each division introduces some additional imprecision.

If accuracy is a problem, the first step should be to use double . Even if the vector is float , for memory reasons, the calculations within the function should be double .

Beyond that, for large numbers of elements, you should probably use the Kahan algorithm , rather than just naïvely adding the elements. Although it adds a number of operations in the loop, it keeps track of the error, and will result in significantly more accuracy.

EDIT:

Just for the fun of it, I wrote a small program which used the following code to generate the vector:

std::vector<float> v;
v.push_back( 10000000.0f );
for ( int count = 10000000; count > 0; -- count ) {
    v.push_back( 0.1f );
}

The results of the average should be 1.0999999 (practically speaking, 1.1). Using either of the algorithms in the original posting, the results are 0.999999881: an error of 10%. Just changing sum to have type double in the first algorithm, however, results in 1.0999999 , about as accurate as you can get. Using the Kahan algorithm (with float everywhere) gives the same results.

如果您的总和对于float类型来说不太大,则第一个精度可能会更高,因为除法产生的单个舍入错误可能会累积

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM