简体   繁体   English

优化Markov链转移矩阵计算?

[英]optimizing markov chain transition matrix calculations?

As an intermediate R user, I know that for loops can very often be optimized by using functions like apply or otherwise. 作为R的中级用户,我知道for循环通常可以通过使用诸如apply或其他功能来优化。 However, I am not aware of functions that can optimize my current code to generate a markov chain matrix, which is running quite slowly. 但是,我不知道可以优化当前代码以生成markov链矩阵的函数,该矩阵运行速度非常慢。 Have I max-ed out on speed or are there things that I am overlooking? 我是否已尽速而为,还是有我要忽略的事情? I am trying to find the transition matrix for a Markov chain by counting the number of occurrences in 24-hour time periods before given alerts. 我正在尝试通过计算给定警报之前24小时内出现的次数来查找马尔可夫链的转换矩阵。 The vector ids contains all possible id's (about 1700). 矢量ids包含所有可能的ID(大约1700)。

The original matrix looks like this, as an example: 原始矩阵如下所示:

>matrix
      id      time
       1     1376084071
       1     1376084937
       1     1376023439
       2     1376084320
       2     1372983476
       3     1374789234
       3     1370234809

And here is my code to try to handle this: 这是我的代码来尝试解决这个问题:

matrixtimesort <- matrix[order(-matrix$time),]
frequency = 86400 #number of seconds in 1 day

# Initialize matrix that will contain probabilities
transprobs <- matrix(data=0, nrow=length(ids), ncol=length(ids))

# Loop through each type of event
for (i in 1:length(ids)){
localmatrix <- matrix[matrix$id==ids[i],]

# Loop through each row of the event
for(j in 1:nrow(localmatrix)) {
    localtime <- localmatrix[j,]$time
    # Find top and bottom row number defining the 1-day window
    indices <- which(matrixtimesort$time < localtime & matrixtimesort$time >= (localtime - frequency))
    # Find IDs that occur within the 1-day window
    positiveids <- unique(matrixtimesort[c(min(indices):max(indices)),]$id)
    # Add one to each cell in the matrix that corresponds to the occurrence of an event

            for (l in 1:length(positiveids)){
            k <- which(ids==positiveids[l])
            transprobs[i,k] <- transprobs[i,k] + 1
            }
    }

# Divide each row by total number of occurrences to determine probabilities
transprobs[i,] <- transprobs[i,]/nrow(localmatrix)
    }
  # Normalize rows so that row sums are equal to 1
  normalized <- transprobs/rowSums(transprobs)

Can anyone make any suggestions to optimize this for speed? 谁能提出建议以优化速度?

Using nested loops seems a bad idea. 使用嵌套循环似乎不是一个好主意。 Your code can be vectorized to speed up. 您的代码可以向量化以加快速度。

For example, why find the top and bottom of row numbers? 例如,为什么要查找行号的顶部和底部? You can simply compare the time value with "time_0 + frequency": it is a vectorized operation. 您可以简单地将时间值与“ time_0 + frequency”进行比较:这是矢量化操作。

HTH. HTH。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM