Let trans_m
be a n
by n
transition matrix of a first-order markov chain. In my problem, n
is large, say 10,000, and the matrix trans_m
is a sparse matrix constructed from Matrix
package. Otherwise, the size of trans_m
would be huge. My goal is to simulate a sequence of markov chain given a vector of initial states s1
and this transition matrix trans_m
. Consider the following concrete example.
n <- 5000 # there are 5,000 states in this case.
trans_m <- Matrix(0, nr = n, nc = n, sparse = TRUE)
K <- 5 # the maximal number of states that could be reached.
for(i in 1:n){
states_reachable <- sample(1:n, size = K) # randomly pick K states that can be reached with equal probability.
trans_m[i, states_reachable] <- 1/K
}
s1 <- sample(1:n, size = 1000, replace = TRUE) # generate 1000 inital states
draw_next <- function(s) {
.s <- sample(1:n, size = 1, prob = trans_m[s, ]) # given the current state s, draw the next state .s
.s
}
sapply(s1, draw_next)
Given the vector of initial states s1
as above, I used sapply(s1, draw_next)
to draw the next state. When n
is larger, sapply
becomes slow. Is there a better way?
Repeatedly indexing by row can be slow, so it's faster to work on the transpose of the transition matrix and use column indexing, and to factor out the indexing from the inner function:
R> trans_m_t <- t(trans_m)
R>
R> require(microbenchmark)
R> microbenchmark(
+ apply(trans_m_t[,s1], 2,sample, x=n, size=1, replace=F)
+ ,
+ sapply(s1, draw_next)
+ )
Unit: milliseconds
expr min
apply(trans_m_t[, s1], 2, sample, x = n, size = 1, replace = F) 111.828814
sapply(s1, draw_next) 499.255402
lq mean median uq max neval
193.1139810 190.4379185 194.6563380 196.4273105 270.418189 100
503.7398805 512.0849013 506.9467125 516.6082480 586.762573 100
Since you're already working with a sparse matrix, you might be able to get even better performance by working directly on the triplets. Using the higher level matrix operators can trigger recompression.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.