简体   繁体   English

Rcpp:使用带Rcpp的数据框时的推荐代码结构(内联)

[英]Rcpp: Recommended code structure when using data frames with Rcpp (inline)

[I had this sketched out as a comment elsewhere but decided to create a proper question...] [我把它作为其他地方的评论勾勒出来,但决定创造一个正确的问题...]

What is currently considered "best practice" in terms of code structuring when using data frames in Rcpp? 在Rcpp中使用数据框时,目前在代码结构方面被认为是“最佳实践”? The ease with which one can "beam over" an input data frame from R to the C++ code is remarkable, but if the data frame has n columns, is the current thinking that this data should be split up into n separate (C++) vectors before being used? 从R到C ++代码可以很容易地“输出”输入数据帧,但是如果数据帧有n列, 那么当前认为这个数据应该被分成n个独立的(C ++)向量在被使用之前?

The response to my previous question on making use of a string (character vector) column in a data frame suggests to me that yes, this is the right thing to do. 我之前关于在数据框中使用字符串(字符向量)列的问题的回答向我表明是的,这是正确的做法。 In particular, there doesn't seem to be support for a notation such as df.name[i] to refer to the data frame information directly (as one might have in a C structure), unless I'm mistaken. 特别是, 似乎没有像df.name [i]那样的表示法直接引用数据帧信息(正如C结构中可能有的那样),除非我弄错了。

However, this leads us into a situation where subsetting down the data is much more cumbersome - instead of being able to subset a data frame in one line, each variable must be dealt with separately. 然而,这导致我们进入这样一种情况,即对数据进行子集化会更加麻烦 - 而不是能够在一行中对数据帧进行子集化,每个变量必须单独处理。 So, is the thinking that subsetting in Rcpp is best done implicitly, via boolean vectors, say? 那么, 是否认为Rcpp中的子集最好通过布尔向量隐式地完成,比如说?

To summarise, I guess in a nutshell I wanted to check my current understanding that although a data frame can be beamed over to the C++ code, there is no way to refer directly to the individual elements of its columns in a "df.name[i]" fashion, and no simple method of generating a sub-dataframe of the input df by selecting rows satisfying simple criteria (eg df.date being in a given range). 总而言之,我想简单地说,我想检查一下我目前的理解,即虽然数据框可以传送到C ++代码,但是没有办法直接引用“df.name”中列的各个元素。 i]“时尚,并且没有通过选择满足简单标准的行来生成输入df的子数据帧的简单方法(例如,df.date在给定范围内)。

Because data frames are in fact internally represented as list of vectors, the access by vectors really is the best you can do. 因为数据帧实际上在内部表示为向量列表,所以向量访问确实是您可以做的最好的。 There simply is no way to subset by row at the C or C++ level. 根本没有办法在C或C ++级别按行进行分组。

There was a good discussion about that on r-devel a few weeks ago in the context of a transpose of a data.frame (which you cannot do 'cheaply' for the same reason). 几周前在一个数据框架转换的背景下对r-devel进行了很好的讨论(由于同样的原因,你不能'廉价')。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM