简体   繁体   中英

optimizing markov chain transition matrix calculations?

As an intermediate R user, I know that for loops can very often be optimized by using functions like apply or otherwise. However, I am not aware of functions that can optimize my current code to generate a markov chain matrix, which is running quite slowly. Have I max-ed out on speed or are there things that I am overlooking? I am trying to find the transition matrix for a Markov chain by counting the number of occurrences in 24-hour time periods before given alerts. The vector ids contains all possible id's (about 1700).

The original matrix looks like this, as an example:

>matrix
      id      time
       1     1376084071
       1     1376084937
       1     1376023439
       2     1376084320
       2     1372983476
       3     1374789234
       3     1370234809

And here is my code to try to handle this:

matrixtimesort <- matrix[order(-matrix$time),]
frequency = 86400 #number of seconds in 1 day

# Initialize matrix that will contain probabilities
transprobs <- matrix(data=0, nrow=length(ids), ncol=length(ids))

# Loop through each type of event
for (i in 1:length(ids)){
localmatrix <- matrix[matrix$id==ids[i],]

# Loop through each row of the event
for(j in 1:nrow(localmatrix)) {
    localtime <- localmatrix[j,]$time
    # Find top and bottom row number defining the 1-day window
    indices <- which(matrixtimesort$time < localtime & matrixtimesort$time >= (localtime - frequency))
    # Find IDs that occur within the 1-day window
    positiveids <- unique(matrixtimesort[c(min(indices):max(indices)),]$id)
    # Add one to each cell in the matrix that corresponds to the occurrence of an event

            for (l in 1:length(positiveids)){
            k <- which(ids==positiveids[l])
            transprobs[i,k] <- transprobs[i,k] + 1
            }
    }

# Divide each row by total number of occurrences to determine probabilities
transprobs[i,] <- transprobs[i,]/nrow(localmatrix)
    }
  # Normalize rows so that row sums are equal to 1
  normalized <- transprobs/rowSums(transprobs)

Can anyone make any suggestions to optimize this for speed?

Using nested loops seems a bad idea. Your code can be vectorized to speed up.

For example, why find the top and bottom of row numbers? You can simply compare the time value with "time_0 + frequency": it is a vectorized operation.

HTH.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM