![](/img/trans.png)
[英]Using "for" loops in R package and error message "Error in `[.data.frame`(, c("date", i)) : undefined columns selected"
[英]Transform factor columns to date columns in a R data.frame using Rcpp
基於這個問題
當Rcpp中的data.frame作為R的參數傳遞時,如何訪問它的因子級別?
我想使用Rcpp將結果字符列轉換為日期。 這是我的初始代碼,它將因子級別轉換為字符列。
樣本數據:
df <- data.frame(
col1 = c(1, 2, 3),
col2 = c("a", "b", "c"),
col3 = factor(
x = c("01/01/2017 00:00:00", "01/06/2017 00:00:00", "05/01/2017 00:00:00"),
levels = c("01/01/2017 00:00:00", "01/06/2017 00:00:00", "05/01/2017 00:00:00")
),
col4 = factor(
x = c("01/01/2018 00:00:00", "01/06/2018 00:00:00", "05/01/2018 00:00:00"),
levels = c("01/01/2018 00:00:00", "01/06/2018 00:00:00", "05/01/2018 00:00:00")
),
stringsAsFactors = FALSE
)
RCPP代碼:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
void GetDateFromFactorLevels(DataFrame df1) {
CharacterVector varNames = df1.names();
for(int i = 0; i < df1.length(); i++) {
if(Rf_isFactor(df1[i]) == 1) {
IntegerVector tempVec=df1[i];
df1[i] = tempVec.attr("levels");
}
}
}
> GetDateFromFactorLevels(df)
> sapply(df, class)
col1 col2 col3 col4
"numeric" "character" "character" "character"
> df
col1 col2 col3 col4
1 1 a 01/01/2017 00:00:00 01/01/2018 00:00:00
2 2 b 01/06/2017 00:00:00 01/06/2018 00:00:00
3 3 c 05/01/2017 00:00:00 05/01/2018 00:00:00
是否可以這樣做並得到類似的東西?
> sapply(df, class)
col1 col2 col3 col4
"numeric" "character" "Date" "Date"
> df
col1 col2 col3 col4
1 1 a 2017-01-01 2018-01-01
2 2 b 2017-06-01 2018-06-01
3 3 c 2017-01-05 2018-01-05
是。 正如Dirk Eddelbuettel的回答所言,如果您可以使用DatetimeVector並將其轉換為DateVector,則將更加容易。 如果您確實必須處理各種因素,那么作為一種快速而骯臟的解決方案(其他人可能會提出更優雅的解決方案),您可以這樣做:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
void GetDateFromFactorLevels(DataFrame df1) {
int n = df1.nrows();
for ( int i = 0; i < df1.length(); i++ ) {
if ( Rf_isFactor(df1[i]) == 1 ) {
IntegerVector values = df1[i]; // Get the integer values
CharacterVector levels = values.attr("levels"); // and the levels
DateVector result(n); // Make an empty DateVector
for ( int j = 0; j < n; ++j ) {
// And for every element of the factor, look up the level
// value corresponding to its integer value,
// and construct a Date by turning it into an std::string
// (and specifying the applicable date format)
result[j] = Date(std::string(levels[values[j] - 1]), "%d/%m/%Y");
}
// Then just replace the df1 column with the DateVector
df1[i] = result;
}
}
}
從R調用時:
Rcpp::sourceCpp("date-stuff.cpp")
df <- data.frame(
col1 = c(1, 2, 3),
col2 = c("a", "b", "c"),
col3 = factor(
x = c("01/01/2017 00:00:00", "01/06/2017 00:00:00", "05/01/2017 00:00:00"),
levels = c("01/01/2017 00:00:00", "01/06/2017 00:00:00", "05/01/2017 00:00:00")
),
col4 = factor(
x = c("01/01/2018 00:00:00", "01/06/2018 00:00:00", "05/01/2018 00:00:00"),
levels = c("01/01/2018 00:00:00", "01/06/2018 00:00:00", "05/01/2018 00:00:00")
),
stringsAsFactors = FALSE
)
sapply(df, class)
#> col1 col2 col3 col4
#> "numeric" "character" "factor" "factor"
df
#> col1 col2 col3 col4
#> 1 1 a 01/01/2017 00:00:00 01/01/2018 00:00:00
#> 2 2 b 01/06/2017 00:00:00 01/06/2018 00:00:00
#> 3 3 c 05/01/2017 00:00:00 05/01/2018 00:00:00
GetDateFromFactorLevels(df)
sapply(df, class)
#> col1 col2 col3 col4
#> "numeric" "character" "Date" "Date"
df
#> col1 col2 col3 col4
#> 1 1 a 2017-01-01 2018-01-01
#> 2 2 b 2017-06-01 2018-06-01
#> 3 3 c 2017-01-05 2018-01-05
由reprex軟件包 (v0.2.1)創建於2018-10-11
當然。 查看具有以下內容的RcppExamples軟件包和源存儲庫:
實際上,您可以通過使用向量成員函數attr()
更改class屬性來實現。 但是構造新向量的簡單方法也應該起作用。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.