I need help on a topic related to markov chains and preprocessing of data. Suppose I have the following matrix relating individuals to states over time:
ID Time1 Time2
1 14021 A A
2 15031 B A
3 16452 A C
I would like to obtain, for this matrix, the state transition matrix: Hence, what is required is
A B C
A 1 0 1
B 1 0 0
C 0 0 0
and the same thing, but now weighted by the toal number of transitions from that state, ie,
A B C
A 0.5 0 0.5
B 1 0 0
C 0 0 0
(as there are two transitions leaving from state A). I know that the markovchain package has a functionality for doing this if one has a sequence, say AAABBAAABBCC, but not if data is set up like I have. Ideally a direct procedure would be great, but if there is some way of turning the data into a set of sequences that would work as well.
Any ideas?
Thanks in advance
Here is another base R
solution.
df <- data.frame(Time1 = c("A","B","A"), Time2 = c("A","A","C"), stringsAsFactors = FALSE)
myStates <- sort(unique(c(df$Time1, df$Time2)))
lenSt <- length(myStates)
currState <- match(df$Time1, myStates)
nextState <- match(df$Time2, myStates)
transMat <- matrix(0L, lenSt, lenSt)
transMat[cbind(currState, nextState)] <- 1L
transMat <- transMat/rowSums(transMat)
transMat[is.na(transMat)] <- 0
transMat
[,1] [,2] [,3]
[1,] 0.5 0 0.5
[2,] 1.0 0 0.0
[3,] 0.0 0 0.0
An igraph
approach, so using df
from Joseph's answer:
library(igraph)
g <- graph_from_data_frame(df)
E(g)$weight = 1/degree(g, mode="out")[df$Time1] # get counts
as_adj(g, attr = "weight", sparse=FALSE) # output weighted adjacency matrix
A B C
A 0.5 0 0.5
B 1.0 0 0.0
C 0.0 0 0.0
Definitely there's a better way. This is me doodling with loops on a lame Friday afternoon.
lvls <- sort(unique(unlist(df[,-1])))
dat <- matrix(0, nrow= length(lvls), ncol= length(lvls))
colnames(dat) <- lvls
rownames(dat) <- lvls
concat <- paste0(df[,2], df[,3])
for (i in 1:length(lvls)) {
for (j in 1:length(lvls)) {
dat[i,j] <- paste0(rownames(dat)[i], colnames(dat)[j])
}
}
dat <- matrix(sapply(dat, function(x) length(grep(x, concat))),
nrow= length(lvls), ncol= length(lvls))
colnames(dat) <- lvls
rownames(dat) <- lvls
dat
## A B C
## A 1 0 1
## B 1 0 0
## C 0 0 0
dat <- dat / rowSums(dat)
dat[is.na(dat)] <- 0
dat
## A B C
##A 0.5 0 0.5
##B 1.0 0 0.0
##C 0.0 0 0.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.