简体   繁体   中英

difference between pandas statistical functions and boost::accumulators

I am getting different results for statistics calculations using pandas and boost::accumulators , and am unsure why.

I have below a simple example using pandas to calculate mean and variance from some returns

import pandas

vals = [ 1, 1, 2, 1, 3, 2, 3, 4, 6, 3, 2, 1 ]
rets = pandas.Series(vals).pct_change()

print(f'count:    {len(rets)}')
print(f'mean:     {rets.mean()}')
print(f'variance: {rets.var()}')

The output of this is:

 count: 12 mean: 0.19696969696969696 variance: 0.6156565656565657 

I am doing the equivalent in C++ using boost::accumulators for the stats calculations

#include <iostream>
#include <iomanip>
#include <cmath>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics/stats.hpp>
#include <boost/accumulators/statistics/count.hpp>
#include <boost/accumulators/statistics/mean.hpp>
#include <boost/accumulators/statistics/variance.hpp>

namespace acc = boost::accumulators;

int main()
{
    acc::accumulator_set<double, acc::stats<acc::tag::count,
                                            acc::tag::mean,
                                            acc::tag::variance>> stats;

    double prev = NAN;
    for (double val : { 1, 1, 2, 1, 3, 2, 3, 4, 6, 3, 2, 1 })
    {
        const double ret = (val - prev) / prev;

        stats(std::isnan(ret) ? 0 : ret);

        prev = val;
    }

    std::cout << std::setprecision(16)
              << "count:    " << acc::count(stats)    << '\n'
              << "mean:     " << acc::mean(stats)     << '\n'
              << "variance: " << acc::variance(stats) << '\n';

    return 0;
}

The output of this is:

 count: 12 mean: 0.1805555555555556 variance: 0.5160108024691359 
  • Why are the mean and variance between pandas and boost::accumulators different?
  • What do I need to do to get the pandas result from boost::accumulators?

In pandas it will remove nan column when you do mean by defualt , if we fill nan as 0 , the out put is same , since you do pct_change , the first item should be NaN

rets.mean()
Out[67]: 0.19696969696969696

rets.fillna(0).mean()
Out[69]: 0.18055555555555555

About var make the freedom to 0

rets.fillna(0).var(ddof=0)
Out[86]: 0.5160108024691358

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM