简体   繁体   English

将数据集从 R 传递到 C++(使用 .Call)

[英]Passing dataset from R to C++ (using .Call)

I need to speed up data processing in R through C++.我需要通过 C++ 加速 R 中的数据处理。 I already have my C++ code and it basically reads from txt file what R should pass.我已经有了我的 C++ 代码,它基本上从 txt 文件中读取 R 应该通过的内容。 Since I need R for my analysis, I want to integrate my C++ code in R.由于我需要 R 进行分析,因此我想将我的 C++ 代码集成到 R 中。

What the C++ code needs is a (large) dataframe (for which I use std::vector< std::vector> >) and a set of parameters, so I am thinking about passing parameters through .Call interface and then deal with data in the following way: C++ 代码需要的是一个(大)数据帧(我使用 std::vector<std::vector> >)和一组参数,所以我正在考虑通过 .Call 接口传递参数,然后处理数据通过以下方式:

  • R: write data in txt file with a given encoding R:用给定的编码在txt文件中写入数据

  • C++: read from txt, do what I need to do and write the result in a txt (which is still a dataset -> std::vector) C++:从txt读取,做我需要做的事情并将结果写入txt(仍然是数据集-> std::vector)

  • R: read the result from txt R:从txt读取结果

This would avoid me to rewrite part of the code.这将避免我重写部分代码。 The possible problem/bottleneck is in reading/writing, do you believe it is a real problem?可能的问题/瓶颈在于阅读/写作,您认为这是一个真正的问题吗?

Otherwise, as an alternative, is it reasonable to copy all my data in C++ structures through .Call interface?否则,作为替代方案,通过 .Call 接口将所有数据复制到 C++ 结构中是否合理?

Thank you.谢谢你。

You could start with the very simple DataFrame example in the RcppExamples package:您可以从RcppExamples包中非常简单的 DataFrame 示例开始:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
List DataFrameExample(const DataFrame & DF) {

    // access each column by name
    IntegerVector a = DF["a"];
    CharacterVector b = DF["b"];
    DateVector c = DF["c"];

    // do something
    a[2] = 42;
    b[1] = "foo";
    c[0] = c[0] + 7; // move up a week

    // create a new data frame
    DataFrame NDF = DataFrame::create(Named("a")=a,
                                      Named("b")=b,
                                      Named("c")=c);

    // and return old and new in list
    return List::create(Named("origDataFrame") = DF,
                        Named("newDataFrame") = NDF);
}

You can assign vectors (from either Rcpp or the STL) and matrices (again, either from Rcpp, or if you prefer nested STL vectors).您可以分配向量(来自 Rcpp 或 STL)和矩阵(同样,来自 Rcpp,或者如果您更喜欢嵌套的 STL 向量)。 And then you also have Eigen and Armadillo via RcppEigen and RcppArmadillo.然后你还可以通过 RcppEigen 和 RcppArmadillo 获得 Eigen 和 Armadillo。 And on and on -- there are over 1350 packages on CRAN you could study.等等 - 您可以研究 CRAN 上的 1350 多个软件包。 And a large set of ready-to-run examples are at the Rcpp Gallery . Rcpp Gallery 中有大量可立即运行的示例。

Reading and writing large datasets back and forth is not an optimal solution for passing the data between R and your C++ code.来回读取和写入大型数据集并不是在 R 和 C++ 代码之间传递数据的最佳解决方案。 Depending on how long your C++ code executes this might or might not be the worst bottleneck in your code, but this approach should be avoided.根据您的 C++ 代码执行多长时间,这可能是也可能不是代码中最严重的瓶颈,但应该避免这种方法。

You can look a at the following solution to pass a data.frame (or data.table) object: Passing a `data.table` to c++ functions using `Rcpp` and/or `RcppArmadillo`您可以查看以下解决方案以传递 data.frame(或 data.table)对象: 使用 `Rcpp` 和/或 `RcppArmadillo` 将 `data.table` 传递给 C++ 函数

As for passing additional parameters, the solution will depend on what kind of parameters we are talking about.至于传递额外的参数,解决方案将取决于我们所谈论的参数类型。 If those are just numeric values, then you can pass them directly to C++ (see High performance functions with Rcpp : http://adv-r.had.co.nz/Rcpp.html ).如果这些只是数值,那么您可以将它们直接传递给 C++(请参阅Rcpp 的高性能函数http : //adv-r.had.co.nz/Rcpp.html )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM