简体   繁体   中英

Draw markov chain given transition matrix in R

Let trans_m be a n by n transition matrix of a first-order markov chain. In my problem, n is large, say 10,000, and the matrix trans_m is a sparse matrix constructed from Matrix package. Otherwise, the size of trans_m would be huge. My goal is to simulate a sequence of markov chain given a vector of initial states s1 and this transition matrix trans_m . Consider the following concrete example.

    n <- 5000 # there are 5,000 states in this case.
    trans_m <- Matrix(0, nr = n, nc = n, sparse = TRUE)
    K <- 5 # the maximal number of states that could be reached.
    for(i in 1:n){
        states_reachable <- sample(1:n, size = K) # randomly pick K states that can be reached with equal probability.
        trans_m[i, states_reachable] <- 1/K
    }
    s1 <- sample(1:n, size = 1000, replace = TRUE) # generate 1000 inital states
    draw_next <- function(s) {
        .s <- sample(1:n, size = 1, prob = trans_m[s, ]) # given the current state s, draw the next state .s
        .s
    }
    sapply(s1, draw_next)

Given the vector of initial states s1 as above, I used sapply(s1, draw_next) to draw the next state. When n is larger, sapply becomes slow. Is there a better way?

Repeatedly indexing by row can be slow, so it's faster to work on the transpose of the transition matrix and use column indexing, and to factor out the indexing from the inner function:

R>    trans_m_t <- t(trans_m)
R>
R>    require(microbenchmark)
R>    microbenchmark(
+       apply(trans_m_t[,s1], 2,sample, x=n, size=1, replace=F)
+     ,
+       sapply(s1, draw_next)
+     )
Unit: milliseconds
                                                            expr        min
 apply(trans_m_t[, s1], 2, sample, x = n, size = 1, replace = F) 111.828814
                                           sapply(s1, draw_next) 499.255402
          lq        mean      median          uq        max neval
 193.1139810 190.4379185 194.6563380 196.4273105 270.418189   100
 503.7398805 512.0849013 506.9467125 516.6082480 586.762573   100

Since you're already working with a sparse matrix, you might be able to get even better performance by working directly on the triplets. Using the higher level matrix operators can trigger recompression.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM