R：如何加快此功能？

Question

I have a large data frame (named z ) that looks like this: 我有一个看起来像这样的大数据框（名为z ）：

    RPos    M1
    1   -0.00020
    2   0.00010
    3   -0.00012
    4   -0.00035
    5   -0.00038 
...etc (about 300,000 observations)

It is essentially a time series (although it is actually a data frame, not ts or zoo ). 它本质上是一个时间序列（尽管实际上是一个数据帧，而不是ts或zoo ）。 Where RPos is the index number (explicitly stored), and M1 is any metric. 其中RPos是索引号（明确存储），而M1是任何度量。

I have another data frame (named actionlist ) with about 30,000 *non-consecutive observations. 我还有另一个数据框（名为actionlist ），其中包含约30,000个非连续观察值。 Each value in actionlist's RPos column represents the last of 34 consecutive points. 动作列表的RPos列中的每个值代表34个连续点中的最后一个。

My final piece of data is a single data frame (named x ) of only 34 consecutive observations. 我的最后一条数据是只有34个连续观测值的单个数据帧（名为x ）。

My goal is to calculate the correlation coefficients between x and each observation in actionlist (which, again, is the end-point of 34 consecutive observations). 我的目标是计算x和动作列表中每个观察值之间的相关系数（同样，它是34个连续观察值的终点）。

To do this I must generate these 34-point consecutive point time series segments from z (the large data frame). 为此，我必须从z （大数据帧）生成这些34点连续点时间序列段。

Currently, I am doing it like this: 目前，我正在这样做：

n1<-33:0
for(i in 1:nrow(actionlist))
{
    crs[i,2]<-cor(z[actionlist$RPos[i]+n1,2],x[,2])  
}

When looking at the Rprof readout this is what I get: 当查看Rprof读数时，这就是我得到的：

$by.self
              self.time self.pct total.time total.pct
[.data.frame       0.68    25.37       0.98     36.57
.Call              0.22     8.21       0.22      8.21
cor                0.16     5.97       2.30     85.82
...etc

It looks as though [.data.frame is taking the longest. 看起来[.data.frame花费的时间最长。 Specifically I am pretty sure that it is this part: z[actionlist$RPos[i]+n1,2] 具体来说，我非常确定这是一部分： z[actionlist$RPos[i]+n1,2]

How can I speed up (eliminate the need for?) this part of the function? 我如何加快（消除需求？）这部分功能？

I asked a similar question before, except instead of looking within a restricted list ( actionlist ) I was looking through every possible consecutive 34-observation within z . 我问过类似的问题，但不是限制列表（内寻找actionlist我一直在寻找到z中的每一个可能的34个连续观测）。 The answer was posted here, but I cannot figure out how to adapt it to a restricted list. 答案已发布在此处，但我不知道如何将其调整为受限列表。

Any help would be very appreciated! 任何帮助将不胜感激！

Answer 1

The most straightforward is probably to build a matrix containing the data you want to compute the correlation with, and eschew the loop altogether. 最直接的方法可能是建立一个包含要用来计算相关性的数据的矩阵，然后完全避开循环。

# Sample data
n <- 3e5
m <- 3e4
k <- 35
z <- data.frame(
  RPos = 1:n,
  M1   = rnorm(n)
)
actionlist <- sample( k:n, m )
x <- rnorm(k)

system.time( for (j in 1:10) {
  # Index of the observations we want
  i <- sapply( (k-1):0, function(u) actionlist - u )
  # Data we want to compute the correlation with
  y <- matrix( z$M1[i], nr=nrow(i) )
  # Computations
  result <- cor(t(y),x)
} ) # 150ms per iteration

R：如何加快此功能？

问题描述

1 个解决方案

解决方案1
4 已采纳 2012-02-15 23:47:42

R：如何加快此功能？

问题描述

1 个解决方案

解决方案1 4 已采纳 2012-02-15 23:47:42

解决方案1
4 已采纳 2012-02-15 23:47:42