为什么我的C ++代码比R慢得多？

Question

我用R和C ++编写了以下代码，它们执行相同的算法：

a）模拟随机变量X 500次。 （X的概率为0.5的值为0.9，概率为0.5的值为1.1）

b）将这500个模拟值相乘得到一个值。 将该值保存在容器中

c）重复10000000次，以使容器具有10000000个值

R：

ptm <- proc.time()
steps <- 500
MCsize <- 10000000
a <- rbinom(MCsize,steps,0.5)
b <- rep(500,times=MCsize) - a
result <- rep(1.1,times=MCsize)^a*rep(0.9,times=MCsize)^b
proc.time()-ptm

C ++

#include <numeric>
#include <vector>
#include <iostream>
#include <random>
#include <thread>
#include <mutex>
#include <cmath>
#include <algorithm>
#include <chrono>

const size_t MCsize = 10000000;
std::mutex mutex1;
std::mutex mutex2;
unsigned seed_;
std::vector<double> cache;

void generatereturns(size_t steps, int RUNS){
    mutex2.lock();
    // setting seed
    try{    
        std::mt19937 tmpgenerator(seed_);
        seed_ = tmpgenerator();
        std::cout << "SEED : " << seed_ << std::endl;
    }catch(int exception){
        mutex2.unlock();
    }
    mutex2.unlock();

    // Creating generator
    std::binomial_distribution<int> distribution(steps,0.5);
    std::mt19937 generator(seed_);

    for(int i = 0; i!= RUNS; ++i){
        double power;
        double returns;
        power = distribution(generator);
        returns = pow(0.9,power) * pow(1.1,(double)steps - power);
        std::lock_guard<std::mutex> guard(mutex1);
        cache.push_back(returns);
    }
}    


int main(){
    std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
    size_t steps = 500;
    seed_ = 777;    

    unsigned concurentThreadsSupported = std::max(std::thread::hardware_concurrency(),(unsigned)1);
    int remainder = MCsize % concurentThreadsSupported;

    std::vector<std::thread> threads;
    // starting sub-thread simulations
    if(concurentThreadsSupported != 1){
        for(int i = 0 ; i != concurentThreadsSupported - 1; ++i){
            if(remainder != 0){
                threads.push_back(std::thread(generatereturns,steps,MCsize /     concurentThreadsSupported + 1));
                remainder--;
            }else{
                threads.push_back(std::thread(generatereturns,steps,MCsize /     concurentThreadsSupported));
            }
        }
    }

    //starting main thread simulation
    if(remainder != 0){
        generatereturns(steps, MCsize / concurentThreadsSupported + 1);
        remainder--;
    }else{
        generatereturns(steps, MCsize / concurentThreadsSupported);
    }

    for (auto& th : threads) th.join();

    std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now() ;
    typedef std::chrono::duration<int,std::milli> millisecs_t ;
    millisecs_t duration( std::chrono::duration_cast<millisecs_t>(end-start) ) ;
    std::cout << "Time elapsed : " << duration.count() << " milliseconds.\n" ;

    return 0;
}

即使我在C ++代码中使用了四个线程，我也无法理解为什么我的R代码比C ++代码（3.29s vs 12s）这么快？ 谁能启发我？ 如何改善C ++代码以使其运行更快？

编辑：

感谢所有的建议！ 我为向量保留了容量，并减少了代码中的锁定量。 generatereturns（）函数中的关键更新是：

std::vector<double> cache(MCsize);
std::vector<double>::iterator currit = cache.begin();
//.....

// Creating generator
std::binomial_distribution<int> distribution(steps,0.5);
std::mt19937 generator(seed_);
std::vector<double> tmpvec(RUNS);
for(int i = 0; i!= RUNS; ++i){
    double power;
    double returns;
    power = distribution(generator);
    returns = pow(0.9,power) * pow(1.1,(double)steps - power);
    tmpvec[i] = returns;
}
std::lock_guard<std::mutex> guard(mutex1);
std::move(tmpvec.begin(),tmpvec.end(),currit);
currit += RUNS;

我没有每次都锁定，而是创建了一个临时向量，然后使用std :: move将该tempvec中的元素移入缓存。 现在，经过时间已减少到1.9秒。

Answer 1

首先，您是否在发布模式下运行它？ 从调试切换到发行版可以在笔记本电脑（Windows 7，i5 3210M）上将运行时间从〜15s减少到〜4.5s。

另外，在我的情况下，将线程数减少到2个，而不是4个（我只有2个内核，但是带有超线程）进一步将运行时间减少到〜2.4s。

将可变功率更改为int（也正如jimifiki所建议的那样）也提供了轻微的提升，将时间缩短至〜2.3s。

Answer 2

可能对您没有太大帮助，但是当您的指数为int时，请先使用pow（double，int）。

int power;
returns = pow(0.9,power) * pow(1.1,(int)steps - power);

你看到任何改善吗？

Answer 3

我真的很喜欢您的问题，我在家中尝试了该代码。 我尝试更改随机数生成器，我对std :: binomial_distribution的实现平均需要大约9.6次generator（）调用。

我知道更多的问题是将R与C ++性能进行比较，但是既然您问“我应该如何改进C ++代码以使其运行更快？” 我坚持用战俘优化。 通过在for循环之前预先计算0.9 ^ steps或1.1 ^ steps，可以轻松避免调用的一半。 这使您的代码运行更快：

double power1 = pow(0.9,steps);
double ratio = 1.1/0.9;
for(int i = 0; i!= RUNS; ++i){
  ... 
  returns = myF1 * pow(myF2, (double)power);

类似地，您可以改善R代码：

...
ratio <-1.1/0.9
pow1 = 0.9^steps
result <- rep(ratio,times=MCsize)^rep(pow1,times=MCsize)
...

为什么我的C ++代码比R慢得多？

问题描述

3 个解决方案

解决方案1
2 已采纳 2014-05-02 09:28:44

解决方案2
1 2014-05-02 09:23:15

解决方案3
1 2014-05-04 05:58:54

为什么我的C ++代码比R慢得多？

问题描述

3 个解决方案

解决方案1 2 已采纳 2014-05-02 09:28:44

解决方案2 1 2014-05-02 09:23:15

解决方案3 1 2014-05-04 05:58:54

解决方案1
2 已采纳 2014-05-02 09:28:44

解决方案2
1 2014-05-02 09:23:15

解决方案3
1 2014-05-04 05:58:54