简体   繁体   中英

Gamma in the Baum Welch algorithm and float precision

I am currently trying to implement a Baum Welch algorithm in C, but I run into the following problem : the gamma function :

gamma(i,t) = alpha(i,t) * beta(i,t) / sum over `i` of(alpha(i,t) * beta(i,t))

Unfortunately, for large enough observation sets, alpha drops rapidly to 0 as t increases, and beta drops rapidly to 0 as t decreases, meaning that, due to rounding down, there is never a spot where both alpha and beta are non-zero, which makes things rather problematic.

Is there a way around this problem or should I just try to increase precision for the values? I fear the problem may just pop up again if I try this approach, as alpha and beta drop of about one order of magnitude per observation.

You should do these computations, and generally all computations for probability models, in log-space:

lg_gamma(i, t) = (lg_alpha(i, t) + lg_beta(i, t)
                  - logsumexp over i of (lg_alpha(i, t) + lg_beta(i, t)))

where lg_gamma(i, t) represents the logarithm of gamma(i, t) , etc., and logsumexp is the function described here . At the end of the computation, you can convert to probabilities using exp , if needed (that's typically only needed for displaying probabilities, but even there logs may be preferable).

The base of the logarithm is not important, as long as you use the same base everywhere. I prefer the natural logarithm, because log saves typing compared to log2 :)

I think you should do the scaling procedure for each observation that makes alpha and beta significantly less than one. it will multiply a coefficient to alpha and beta that keeps them in a comparable bound.
you should multiply a coefficient, say c to each alpha variable that keeps it in a comparable bound, this c should be in the form of :

c(t) = 1 / sum(alpha(t,i)) , i=1... number of states , t=time step ( observation)

note that at each time step you will compute a c(t) will multiply to all alpha 's for all states on that time step. next do same procedure for beta 's.

there is a good tutorial about HMM that explains this procedure good enough : a tutorial on hidden markov models and selected applications in speech recognition (rabiner 1989)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM