简体   繁体   English

没有Rcpp可以加快速度吗?

[英]Speed this up without Rcpp?

I'm looking to speed up the following algorithm. 我希望加快以下算法的速度。 I give the function an xts time series and then want to perform a principal components analysis for each time point on the previous X points (I'm using 500 at the moment) and then use the results of that PCA (5 principal components in the following code) to compute some value. 我给该函数一个xts时间序列,然后想要对先前X个点上的每个时间点执行一次主成分分析(我目前使用的是500个),然后使用该PCA的结果(以下代码)以计算一些值。 Something like this: 像这样:

lookback <- 500
for(i in (lookback+1):nrow(x))
{   
        x.now <- x[(i-lookback):i]        
        x.prcomp <- prcomp(x.now)
        ans[i] <- (some R code on x.prcomp)
}

I assume this would require me to replicate the lookback rows as columns so that x would be something like cbind(x,lag(x),lag(x,k=2),lag(x,k=3)...lag(x,k=lookback)) , and then run prcomp on each line? 我假设这将需要我将回溯行复制为列,以便x类似于cbind(x,lag(x),lag(x,k=2),lag(x,k=3)...lag(x,k=lookback)) ,然后在每一行上运行prcomp This seems expensive though. 不过,这似乎很昂贵。 Perhaps some variant of apply ? 也许apply某些变体? I'm willing to look into Rcpp but wanted to run this by you guys before that. 我愿意研究Rcpp,但在此之前想由你们来运行。

Edit: Wow thanks for all the responses. 编辑:哇,感谢您的所有答复。 Info on my dataset/algorithm: 有关我的数据集/算法的信息:

  1. dim(x.xts) currently = 2000x24. dim(x.xts)当前= 2000x24。 But eventually, if this shows promise, it will have to run fast (I'll give it multiple datasets). 但是最终,如果这显示出希望,它将必须快速运行(我将为它提供多个数据集)。
  2. func(x.xts) takes ~70 seconds. func(x.xts)需要约70秒。 That's 2000-500 prcomp calls with 1500 500x24 dataframe creations. 那是2000-500 prcomp调用,创建了1500 500x24数据帧。

I attempted to use Rprof to see what was the most expensive part of the algo but it's my first time using Rprof so I need some more experience with this tool to get intelligible results (thanks for the suggestion). 我尝试使用Rprof来查看算法中最昂贵的部分,但这是我第一次使用Rprof因此我需要更多使用此工具的经验以获得清晰的结果(感谢您的建议)。

I think I will first attempt to roll this into an _apply type loop, and then look at parallelizing. 我想我将首先尝试将其放入_apply类型循环,然后再研究并行化。

On my 4 core desktop, if this wouldn't complete in a reasonable time-frame, I would run the chunk using something along the lines of (not tested): 在我的4核心台式机上,如果无法在合理的时间内完成任务,我将使用类似于(未测试)的方法运行该块:

library(snowfall)
sfInit(parallel = TRUE, cpus = 4, type = "SOCK")
lookback <- 500
sfExport(list = c("lookback", "x"))
sfLibrary(xts)

output.object <- sfSapply(x = (lookback+1):nrow(x),
    fun = function(i, my.object = x, lb = lookback) {
        x.now <- my.object[(i-lb):i]      
        x.prcomp <- prcomp(x.now)
        ans <- ("some R code on x.prcomp")

        return(ans)
    }, simplify = FALSE) # or maybe it's TRUE? depends on what ans is

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM