简体   繁体   中英

How to identify state transition probabilities after getting Markov Chain using markovchainFit in R?

I have a sequence of events:

library(markovchain)
sequence<-c("LHR - BA","BOS - BA","BOS - ZE","IAD - ZE","BOS - BA","LHR - BA",
            "LGW - BA","TPA - BA","TPA - BA","LGW - BA","LHR - BA","BOS - BA",
            "BOS - ZE","BOS - ZE","BOS - BA","LHR - BA","LHR - BA","BOS - BA",
            "BOS - ZE","BOS - ZE","BOS - DL","ATL - DL","LHR - BA","BRU - BA")

Using this sequence, I get a Markov chain using the following function:

sequenceMatr <- createSequenceMatrix(sequence, sanitize=FALSE)
mcFit        <- markovchainFit(data=sequence, method="mle")

Consider that the next state is "LHR - BA" ; how then do I identify the probability distribution across states in the following format:

           "LHR - BA", "BOS - ZE", "IAD - ZE", "BOS - BA", "LHR - BA", "LGW - BA", "TPA - BA"
"LHR - BA"     0.1   ,     0.2   ,    0.2    ,     0.3   ,     0.1   ,     0.1   ,     0.1

I find your question a bit hard to understand, but here is my interpretation.

Consider that the next state is "LHR - BA" ; how then do I identify the probability distribution across states

So the way I read this, you know that at some time t +1 your system is in state "LHR - BA" and you want to know the distribution probability at time t . In other words, you want the conditional probability

P(S(t)=x | S(t+1)="LHR - BA")

According to Bayes' law this probability is equal to

P(S(t)=x) * P(S(t+1)="LHR - BA" | S(t)=x) / P(S(t+1)="LHR - BA")

For this to work, you need some estimate of the unconditional distribution. For large t and a reasonable (don't know the correct term here) markov chain you can simply take the (hopefully unique) steady state here. With that interpretation you can translate the above formula into R in a pretty direct manner:

mcEst <- mcFit$estimate
mcSteady <- steadyStates(mcEst)
sapply(states(mcEst), function(x) transitionProbability(mcEst, x, "LHR - BA")*mcSteady[1,x]/mcSteady[1,"LHR - BA"])

But perhaps you want to write this in a nicer way, and have rows for every possible S(t+1) not just "LHR - BA" . For that you'll have to deal with the transition matrix more directly, instead of calling transitionProbability , since that method doesn't work well vor vectorized arguments.

tm <- mcEst@transitionMatrix
if (mcEst@byrow) tm <- t(tm)
res <- tm * t(outer(mcSteady[1,], mcSteady[1,], "/"))

The first two lines obtain the transition matrix and make sure that the row (first index) is the target state and the column (second index) is the source state. See getMethods("transitionProbability") for shipped functions which do this kind of thing. So at that point you have

tm[i,j] = P(S(t+1)=i | S(t)=j)

You then take the steady state, and do all possible combinations. You get

outer(mcSteady[1,], mcSteady[1,], "/")[i,j] = mcSteady[1,i]/mcSteady[1,j]

which is the wrong way round. So you transpose it and get

t(outer(mcSteady[1,], mcSteady[1,], "/"))[i,j] = mcSteady[1,j]/mcSteady[1,i]

which you multiply by tm to obtain the final result:

res[i,j] = P(S(t+1)=i | S(t)=j) * P(S(t)=j) / P(S(t+1)=i)

Each row of that resulting table will be one distribution for a given successor state. Including the one you asked for:

> res["LHR - BA",]
  ATL - DL   BOS - BA   BOS - DL   BOS - ZE   BRU - BA   IAD - ZE   LGW - BA 
0.23379630 0.37037037 0.00000000 0.00000000 0.02083333 0.00000000 0.20833333 
  LHR - BA   TPA - BA 
0.16666667 0.00000000 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM