简体   繁体   中英

Predicting future emissions from fitted HMM model

I've fitted a HMM model to my data using hmm.discnp package in R as follows:

library(hmm.discnp)
zs <- hmm(y=lis,K=5)

Now I want to predict the future K observations (emissions) from this model. But I am only able to get most probable state sequence for the observations that I already have through Viterbi algorithm.

I have t emissions already , ie (y(1),...,y(t)) .
I want the most probable future K emissions from the fitted HMM object ie (y(t+1),...y(t+k)) .

Is there a function to calculate this? if not then how do I calculate it manually?

Generating emissions from an HMM is pretty straightforward to do manually. I'm am not really familiar with R but I explain here the steps to generate data as you ask.

First thing to keep in mind is that, by its Markovian nature, the HMM has no memory. At any time, only the current state is known, what happened before is "forgotten". This means that the generation of the sample at time t+1 only depends of the sample at time t .

If you have a sequence, the first thing you can do is to fit the most probable state sequence (with the Viterbi algorithm) as you did. Now, you know the state that generated the last observation that you have (the one that you denote y(t) ).

Now, from this state, you know the probabilities to transit to each other state of the model thanks to the transition matrix. This is a probability mass function (pmf) and you can draw a state number from this pmf (not by hand! R should have a built-in function to draw a sample from a pmf). The state number you draw is the state in which your system is at time t+1 .

With this information, you can now draw a sample observation from the probability function that is assigned to this new state (same here, if it is a Gaussian distribution, use a Gaussian random generator that should exist in R).

From this state t+1 , you can now apply the same procedure to reach a state at time t+2 and so on.

Keep in mind that if you do this full procedure several times (to generate data samples from time t+1 to t+k ), you will end up with different results. This is due to the probabilistic nature of the model. I am not sure of what you mean by most probable future emissions and I am not sure whether there are some routines or not to do so. You can compute the likelihood of the full sequence you obtain at the end (from 1 to t+k ). It will in general be greater that the likelihood of the sequence up to t as the last part has been truly generated from the model itself and thus "perfectly" fits in some regards.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM