简体   繁体   中英

Parallel Addition of Vectors using RcppParallel

I am trying to parallelise the addition of (large) vectors using RcppParallel. That's what I've come up with.

// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
#include <Rcpp.h>
#include <assert.h> 
using namespace RcppParallel;
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector directVectorAddition(NumericVector first, NumericVector second) {
    assert (first.length() == second.length());
    NumericVector results(first.length());
    results = first + second;
    return results;
}

// [[Rcpp::export]]
NumericVector loopVectorAddition(NumericVector first, NumericVector second) {
    assert (first.length() == second.length());
    NumericVector results(first.length());
    for(unsigned i = 0; i != first.length(); i++)
        results[i] = first[i] + second[i];
    return results;
}

struct VectorAddition : public Worker
{
    const RVector<double> first, second;
    RVector<double> results;
    VectorAddition(const NumericVector one, const NumericVector two, NumericVector three) : first(one), second(two), results(three) {}

    void operator()(std::size_t a1, std::size_t a2) {
        std::transform(first.begin() + a1, first.begin() + a2,
                       second.begin() + a1,
                       results.begin() + a1,
                       [](double i, double j) {return i + j;});
    }
};


// [[Rcpp::export]]
NumericVector parallelVectorAddition(NumericVector first, NumericVector second) {
    assert (first.length() == second.length());
    NumericVector results(first.length());
    VectorAddition myVectorAddition(first, second, results);
    parallelFor(0, first.length(), myVectorAddition);
    return results;
}

It seems to work, but doesn't speed up things (at least not on a 4-core machine).

> v1 <- 1:1000000
> v2 <- 1000000:1
> all(directVectorAddition(v1, v2) == loopVectorAddition(v1, v2))
[1] TRUE
> all(directVectorAddition(v1, v2) == parallelVectorAddition(v1, v2))
[1] TRUE
> result <- benchmark(v1 + v2, directVectorAddition(v1, v2), loopVectorAddition(v1, v2), parallelVectorAddition(v1, v2), order="relative")
 > result[,1:4]
                            test replications elapsed relative
    1                        v1 + v2          100   0.206    1.000
    4 parallelVectorAddition(v1, v2)          100   0.993    4.820
    2   directVectorAddition(v1, v2)          100   1.015    4.927
    3     loopVectorAddition(v1, v2)          100   1.056    5.126

Can this be implemented more efficiently?

Thanks a lot in advance,

mce

Rookie mistake :) You define this as Rcpp::NumericVector but create data that is created via the sequence operator. And that creates integer values so you are forcing a copy onto all your functions!

Make it

v1 <- as.double(1:1000000)
v2 <- as.double(1000000:1)

instead, and on a machine with lots of cores (at work) I then see

R> result[,1:4]
                            test replications elapsed relative
4 parallelVectorAddition(v1, v2)          100   0.301    1.000
2   directVectorAddition(v1, v2)          100   0.424    1.409
1                        v1 + v2          100   0.436    1.449
3     loopVectorAddition(v1, v2)          100   0.736    2.445

The example is still not that impressive because the relevant operation is "cheap" whereas the parallel approach needs to allocate memory, copy data to workers, collect again etc pp.

But the good news is that you wrote your parallel code correctly. Not a small task.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM