c++ 方差和標准差

Question

我創建了一個提示用戶輸入數據集的程序。 該程序存儲數據並對其進行排序，然后計算數組的方差和標准差。 但是，我沒有得到正確的方差和標准差計算（答案略有偏差）。 任何人都知道問題似乎是什么？

#include <iostream>
#include <iomanip>
#include <array>

using namespace std;

//function declarations
void GetData(double vals[], int& valCount);
void Sort(double vals[], int& valCount);
void printSort(double vals[], int& valCount);
double Variance(double vals[], int valCount);
double StandardDev(double vals[], int valCount);
double SqRoot(double value); //use for StandardDev function

//function definitions
int main ()
{
    double vals = 0;

    int valCount = 0;        //number of values to be processed

    //ask user how many values
    cout << "Enter the number of values (0 - 100) to be processed: ";
    cin >> valCount;

    //process and store input values
    GetData(&vals, valCount);

    //sort values
    Sort(&vals, valCount);

    //print sort
    cout << "\nValues in Sorted Order: " << endl;
    printSort(&vals, valCount);

    //print variance
    cout << "\nThe variance for the input value list is: " << Variance(&vals, valCount);

    //print standard deviation
    cout << "\nThe standard deviation for the input list is: " <<StandardDev(&vals, valCount)<< endl;

    return 0;
}

//prompt user to get data
void GetData(double vals[], int& valCount)
{
    for(int i = 0; i < valCount; i++)
    {
        cout << "Enter a value: ";
        cin >> vals[i];
    }
}

//bubble sort values
void Sort(double vals[], int& valCount)
{
    for (int i=(valCount-1); i>0; i--)
        for (int j=0; j<i; j++)
    if (vals[j] > vals[j+1])
           swap (vals[j], vals[j+1]);
}

//print sorted values
void printSort(double vals[], int& valCount)
{
    for (int i=0; i < valCount; i++)
        cout << vals[i] << "\n";
}

//compute variance
double Variance(double vals[], int valCount)
{
    //mean
    int sum = 0;
    double mean = 0;
    for (int i = 0; i < valCount; i++)
        sum += vals[i];
        mean = sum / valCount;

    //variance
    double squaredDifference = 0;
    for (int i = 0; i < valCount; i++)
        squaredDifference += (vals[i] - mean) * (vals[i] - mean);
    return squaredDifference / valCount;
}

//compute standard deviation
double StandardDev(double vals[], int valCount)
{
    double stDev;
    stDev = SqRoot(Variance(vals, valCount));
    return stDev;
}

//compute square root
double SqRoot(double value)
{
    double n = 0.00001;
    double s = value;
    while ((s - value / s) > n)
    {
        s = (s + value / s) / 2;
    }

    return s;
}

Answer 1

導致您出錯的代碼有很多錯誤。 類型不匹配，但更重要的是，您從未創建數組來存儲值。 您將普通雙精度數視為數組，幸運的是您的程序從未在您身上崩潰。

下面是你的代碼的一個工作版本，用一個組成的數據集和 Excel 驗證。我盡可能多地留下你的代碼，只是在適當的時候注釋掉。 如果我把它注釋掉了，我並沒有對它做任何修改，所以可能還是有錯誤。

在這種情況下，數組上的向量。 你不知道前面的大小（在編譯時），向量比動態 arrays 更容易。你也從來沒有數組。 矢量也知道它們有多大，所以你不需要傳遞大小。

類型不匹配。 你的函數一直期待一個雙精度數組，但你的總和是一個整數，還有許多其他不匹配。 您還傳遞了一個普通的雙精度數，就像它是一個數組一樣，寫在 memory 中，這不是您可以像那樣更改的。

立即開始的最佳實踐。 停止using namespace std; . 只需在需要時限定您的名字，或者using std::cout;等行更具體在 function 的頂部。你的命名到處都是。 選擇一個命名方案並堅持下去。 以大寫字母開頭的名稱通常是為類或類型保留的。

#include <iomanip>
#include <iostream>
// #include <array>  // You never actually declared a std::array
#include <vector>  // You don't know the size ahead of time, vectors are the
                   // right tool for that job.

// Use what's available
#include <algorithm>  // std::sort()
#include <cmath>      // std::sqrt()
#include <numeric>    // std::accumulate()

// function declarations
// Commented out redundant functions, and changed arguments to match
void get_data(std::vector<double>& vals);
// void Sort(double vals[], int& valCount);
void print(const std::vector<double>& vals);
double variance(const std::vector<double>& vals);
double standard_dev(const std::vector<double>& vals);
// double SqRoot(double value); //use for StandardDev function

// function definitions
int main() {
  int valCount = 0;  // number of values to be processed

  // ask user how many values
  std::cout << "Enter the number of values (0 - 100) to be processed: ";
  std::cin >> valCount;
  std::vector<double> vals(valCount, 0);
  // Was just a double, but you pass it around like it's an array. That's
  // really bad. Either allocate the array on the heap, or use a vector.
  // Moved to after getting the count so I could declare the vector with
  // that size up front instead of reserving later; personal preference.

  // process and store input values
  get_data(vals);

  // sort values
  // Sort(&vals, valCount);
  std::sort(vals.begin(), vals.end(), std::less<double>());
  // The third argument can be omitted as it's the default behavior, but
  // I prefer being explicit. If compiling with C++17, the <double> can
  // also be omitted due to a feature called CTAD

  // print sort
  std::cout << "\nValues in Sorted Order: " << '\n';
  print(vals);

  // print variance
  std::cout << "\nThe variance for the input value list is: " << variance(vals);

  // print standard deviation
  std::cout << "\nThe standard deviation for the input list is: "
            << standard_dev(vals) << '\n';

  return 0;
}

// prompt user to get data
void get_data(std::vector<double>& vals) {
  for (unsigned int i = 0; i < vals.size(); i++) {
    std::cout << "Enter a value: ";
    std::cin >> vals[i];
  }
}

// //bubble sort values
// void Sort(double vals[], int& valCount)
// {
//     for (int i=(valCount-1); i>0; i--)
//         for (int j=0; j<i; j++)
//     if (vals[j] > vals[j+1])
//            swap (vals[j], vals[j+1]);
// }

// print sorted values
void print(const std::vector<double>& vals) {
  for (auto i : vals) {
    std::cout << i << ' ';
  }
  std::cout << '\n';
}

// compute variance
double variance(const std::vector<double>& vals) {
  // was int, but your now vector is of type double
  double sum = std::accumulate(vals.begin(), vals.end(), 0);
  double mean = sum / static_cast<double>(vals.size());

  // variance
  double squaredDifference = 0;
  for (unsigned int i = 0; i < vals.size(); i++)
    squaredDifference += std::pow(vals[i] - mean, 2);
  // Might be possible to get this with std::accumulate, but my first go didn't
  // work.

  return squaredDifference / static_cast<double>(vals.size());
}

// compute standard deviation
double standard_dev(const std::vector<double>& vals) {
  return std::sqrt(variance(vals));
}

// //compute square root
// double SqRoot(double value)
// {
//     double n = 0.00001;
//     double s = value;
//     while ((s - value / s) > n)
//     {
//         s = (s + value / s) / 2;
//     }

//     return s;
// }

編輯：我確實計算出了累加器的方差。 它確實需要了解 lambdas（匿名函數、仿函數）。 我編譯為 C++14 標准，這一段時間以來一直是主要編譯器的默認值。

double variance(const std::vector<double>& vals) {
  auto meanOp = [valSize = vals.size()](double accumulator, double val) {
    return accumulator += (val / static_cast<double>(valSize));
  };
  double mean = std::accumulate(vals.begin(), vals.end(), 0.0, meanOp);

  auto varianceOp = [mean, valSize = vals.size()](double accumulator,
                                                  double val) {
    return accumulator +=
           (std::pow(val - mean, 2) / static_cast<double>(valSize));
  };

  return std::accumulate(vals.begin(), vals.end(), 0.0, varianceOp);
}

Answer 2

mean = sum / valCount; in Variance將使用 integer 數學計算，然后轉換為 double。 您需要先轉換為 double：

mean = double(sum) / valCount;

您的SqRoot function 計算出一個近似值。 您應該改用std::sqrt ，這樣會更快更准確。

c++ 方差和標准差

問題描述

2 個解決方案

解決方案1
1 2020-04-15 18:14:25

解決方案2
0 2020-04-15 01:04:16

c++ 方差和標准差

問題描述

2 個解決方案

解決方案1 1 2020-04-15 18:14:25

解決方案2 0 2020-04-15 01:04:16

解決方案1
1 2020-04-15 18:14:25

解決方案2
0 2020-04-15 01:04:16