[英]Rcpp - extracting rows from list of matrices / dataframes
As a follow up to this question , I've decided to go down the route of Rcpp vs convoluted syntax in R. I think this will provide better readability (and possibly also be faster). 作为此问题的后续,我决定沿用Rcpp vs R中的复杂语法的路线。我认为这将提供更好的可读性(并且可能还会更快)。
Let's say I have a list of data.frame
s (which I can easily convert to matrices via as
). 假设我有一个data.frame
列表(我可以很容易地通过as
转换为矩阵)。 Given prior answe -r -s , this seems the best approach. 给定先前的answe -r -s ,这似乎是最好的方法。
# input data
my_list <- vector("list", length= 10)
set.seed(65L)
for (i in 1:10) {
my_list[[i]] <- data.frame(matrix(rnorm(10000),ncol=10))
# alternatively
# my_list[[i]] <- matrix(rnorm(10000),ncol=10)
}
What's the appropriate way to extract rows from the matrices? 从矩阵中提取行的合适方法是什么? The goal is to create a list with each list element containing a list of the nr
th row of each of the original list's data.frames. 目标是创建一个列表,其中每个列表元素都包含原始列表每个data.frames的nr
n行的列表。 I've tried several different syntaxes and keep getting errors: 我尝试了几种不同的语法,并不断出错:
#include <Rcpp.h>
using namespace Rcpp;
using namespace std:
List foo(const List& my_list, const int& n_geo) {
int n_list = my_list.size();
std::vector<std::vector<double> > list2(n_geo);
// needed code....
return wrap(list2);
}
options 选项
for (int i = 0; i < n_list; i++) {
for (int nr = 0; nr < n_geo; nr++) {
list2[nr][i] = my_list[i].row(nr);
// or list2[nr].push_back(my_list[i].row(nr));
// or list2[nr].push_back(as<double>(my_list[i].row(nr)));
// or list2[nr].push_back(as<double>(my_list[i](nr, _)));
}
}
// or:
NumericMatrix a = my_list[1]
...
NumericMatrix j = my_list[10]
for (int nr = 0; nr < n_geo; nr++) {
list2[nr][1] = // as above
}
None of these are working for me. 这些都不对我有用。 What am I doing wrong? 我究竟做错了什么? Here are the errors I receive from my above syntax choices. 这是我从上述语法选择中收到的错误。
error: no matching function for call to 'as(Rcpp::Matrix<14>::Row)' 错误:没有匹配的函数调用'as(Rcpp :: Matrix <14> :: Row)'
or 要么
error: cannot convert 'Rcpp::Matrix<14>::Row {aka Rcpp::MatrixRow<14>}' to 'double' in assignment 错误:无法将分配中的'Rcpp :: Matrix <14> :: Row {aka Rcpp :: MatrixRow <14>}'转换为'double'
Here is one way to do it: 这是一种实现方法:
#include <Rcpp.h>
// x[[nx]][ny,] -> y[[ny]][[nx]]
// [[Rcpp::export]]
Rcpp::List Transform(Rcpp::List x) {
R_xlen_t nx = x.size(), ny = Rcpp::as<Rcpp::NumericMatrix>(x[0]).nrow();
Rcpp::List y(ny);
for (R_xlen_t iy = 0; iy < ny; iy++) {
Rcpp::List tmp(nx);
for (R_xlen_t ix = 0; ix < nx; ix++) {
Rcpp::NumericMatrix mtmp = Rcpp::as<Rcpp::NumericMatrix>(x[ix]);
tmp[ix] = mtmp.row(iy);
}
y[iy] = tmp;
}
return y;
}
/*** R
L1 <- lapply(1:10, function(x) {
matrix(rnorm(20), ncol = 5)
})
L2 <- lapply(1:nrow(L1[[1]]), function(x) {
lapply(L1, function(y) unlist(y[x,]))
})
all.equal(L2, Transform(L1))
#[1] TRUE
microbenchmark::microbenchmark(
"R" = lapply(1:nrow(L1[[1]]), function(x) {
lapply(L1, function(y) unlist(y[x,]))
}),
"Cpp" = Transform(L1),
times = 200L)
#Unit: microseconds
#expr min lq mean median uq max neval
# R 254.660 316.627 383.92739 347.547 392.7705 1909.097 200
#Cpp 18.314 26.007 71.58795 30.230 38.8650 945.167 200
*/
I'm not sure how this will scale; 我不确定这会如何扩展; I think it is just an inherently inefficient transformation. 我认为这只是一种固有的低效转换。 As per my comment at the top of the source, it seems like you are just doing a sort of coordinate swap -- the ny
th row of the nx
th element of the input list becomes the nx
th element of the ny
th element of the output list: 按照在源的顶部我的意见,就好像你只是在做一种协调互换的-在ny
的第的排nx
个输入列表中的元素就成为nx
次的元素ny
的个元素输出清单:
x[[nx]][ny,] -> y[[ny]][[nx]]
To address the errors you were getting, Rcpp::List
is a generic object - technically an Rcpp::Vector<VECSXP>
- so when you try to do, eg 为了解决您遇到的错误, Rcpp::List
是一个通用对象-从技术上讲是Rcpp::Vector<VECSXP>
-因此,当您尝试执行此操作时,例如
my_list[i].row(nr)
the compiler doesn't know that my_list[i]
is a NumericMatrix
. 编译器不知道my_list[i]
是NumericMatrix
。 Therefore, you have to make an explicit cast with Rcpp::as<>
, 因此,您必须使用Rcpp::as<>
进行显式Rcpp::as<>
。
Rcpp::NumericMatrix mtmp = Rcpp::as<Rcpp::NumericMatrix>(x[ix]);
tmp[ix] = mtmp.row(iy);
I just used matrix
elements in the example data to simplify things. 我只是在示例数据中使用matrix
元素来简化操作。 In practice you are probably better off coercing data.frame
s to matrix
objects directly in R than trying to do it in C++; 实际上,与在C ++中尝试将data.frame
强制data.frame
为matrix
对象直接相比,可能会更好。 it will be much simpler, and most likely, the coercion is just calling underlying C code, so there isn't really anything to be gained trying to do it otherwise. 它将更加简单,而且很有可能,强制只是调用底层的C代码,因此尝试执行此操作实际上没有任何收获。
I should also point out that if you are using a Rcpp::List
of homogeneous types, you can squeeze out a little more performance with Rcpp::ListOf<type>
. 我还应该指出,如果您使用的是Rcpp::List
同质类型,则可以使用Rcpp::ListOf<type>
挤出更多性能。 This will allow you to skip the Rcpp::as<type>
conversions done above: 这将允许您跳过上面完成的Rcpp::as<type>
转换:
typedef Rcpp::ListOf<Rcpp::NumericMatrix> MatList;
// [[Rcpp::export]]
Rcpp::List Transform2(MatList x) {
R_xlen_t nx = x.size(), ny = x[0].nrow();
Rcpp::List y(ny);
for (R_xlen_t iy = 0; iy < ny; iy++) {
Rcpp::List tmp(nx);
for (R_xlen_t ix = 0; ix < nx; ix++) {
tmp[ix] = x[ix].row(iy);
}
y[iy] = tmp;
}
return y;
}
/*** R
L1 <- lapply(1:10, function(x) {
matrix(rnorm(20000), ncol = 100)
})
L2 <- lapply(1:nrow(L1[[1]]), function(x) {
lapply(L1, function(y) unlist(y[x,]))
})
microbenchmark::microbenchmark(
"R" = lapply(1:nrow(L1[[1]]), function(x) {
lapply(L1, function(y) unlist(y[x,]))
}),
"Transform" = Transform(L1),
"Transform2" = Transform2(L1),
times = 200L)
#Unit: microseconds
# expr min lq mean median uq max neval
# R 6049.594 6318.822 7604.871 6707.242 8592.510 64005.190 200
# Transform 928.468 1041.936 3130.959 1166.819 1659.745 71552.284 200
#Transform2 850.912 957.918 1694.329 1061.183 2856.724 4502.065 200
*/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.