简体   繁体   English

使用 Rcpp 循环进行日期迭代

[英]Date iteration with Rcpp loop

For fastening purpose, i'm trying to convert a simple 'for loop' in R into a Rcpp one.出于紧固目的,我试图将 R 中的一个简单的“for 循环”转换为 Rcpp 循环。

I have a date vector named "date_vector" which is composed by X identical dates.我有一个名为“date_vector”的日期向量,它由 X 个相同的日期组成。 For each iteration of i, I add 1 minutes to the date_vector value.对于 i 的每次迭代,我将 1 分钟添加到 date_vector 值。 The R 'for loop' (see below) works properly, but it is too slow for my very large dataset (2 years ~ 1million of rows). R 'for 循环'(见下文)工作正常,但对于我非常大的数据集(2 年 ~ 100 万行)来说它太慢了。

I've read that Rccp could be a solution to speed up the loop.我读过 Rccp 可能是加速循环的解决方案。 However, I'm a 'Rcpp' noob and I'm struggling to convert my loop.但是,我是一个“Rcpp”菜鸟,我正在努力转换我的循环。

Can someone help me and explain me the solution?有人可以帮助我并向我解释解决方案吗? Thank you very much.非常感谢你。 Best wishes for 2023.祝 2023 年一切顺利。

The orignial R Loop:原来的R循环:

for(i in 2:nrow(klines)){
  
  date_vector[i] <- date_vector[i-1]+minutes(1)
  
}

My Rcpp loop attempt:我的 Rcpp 循环尝试:

cpp_update_date_vector <- cppFunction('DateVector fonction_test(DateVector zz),
  
  int n = zz.size();
  DateVector = date_vector;
  
    for (int i = 0; i < n; i++) {
    
    date_vector[i] = date_vector[i-1] + 60; 
  
  }
')

You can likely achieve your goal without a loop at all.您可能根本不需要循环就可以实现您的目标。 It sounds like you're trying to change a vector of identical datetimes to a sequence one minute apart, right?听起来您正在尝试将相同日期时间的向量更改为相隔一分钟的序列,对吗? If so, you could do:如果是这样,你可以这样做:

library(lubridate) 

date_vector <- rep(ymd_hms("2020-01-01 12:00:00"), 10)

date_vector + minutes(seq_along(date_vector) - 1)
 [1] "2020-01-01 12:00:00 UTC" "2020-01-01 12:01:00 UTC"
 [3] "2020-01-01 12:02:00 UTC" "2020-01-01 12:03:00 UTC"
 [5] "2020-01-01 12:04:00 UTC" "2020-01-01 12:05:00 UTC"
 [7] "2020-01-01 12:06:00 UTC" "2020-01-01 12:07:00 UTC"
 [9] "2020-01-01 12:08:00 UTC" "2020-01-01 12:09:00 UTC"

For completeness, here is how you would write the code in Rcpp:为了完整起见,以下是您在 Rcpp 中编写代码的方式:

cpp_update_date_vector <- Rcpp::cppFunction('
DatetimeVector fonction_test(DatetimeVector zz) {
    for (int i = 1; i < zz.size(); i++) {
      zz[i] = zz[i-1] + 60; 
    }
  return zz;
}
')

But it is no faster then base R's seq function, which can easily create a sequence of date-times 1 minute apart.但它并不比 base R 的seq function 快,后者可以轻松创建相隔 1 分钟的日期时间序列。 Here is a comparison of the two methods on a 1,000,000-length date-time vector.下面是对长度为 1,000,000 的日期时间向量的两种方法的比较。 Note that they are both comparable, and both considerably faster than using lubridate .请注意,它们具有可比性,并且都比使用lubridate

microbenchmark::microbenchmark(
  lubridate = big_vec + lubridate::minutes(seq_along(big_vec) - 1),
  Rcpp = cpp_update_date_vector(big_vec),
  base_R = seq(big_vec[1], by = "1 min", length = 1000000)
)

#> Unit: milliseconds
#> expr           min       lq    mean    median       uq      max neval cld
#> lubridate 1168.921 1203.845 1318.950 1215.465 1570.376 1691.765   100   b
#>      Rcpp    3.733    3.770    8.742    3.799    3.909  467.236   100  a 
#>    base_R    2.172    2.338    3.167    2.407    2.484   40.222   100  a 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM