简体   繁体   English

Baum Welch 算法中的 Gamma 和浮点精度

[英]Gamma in the Baum Welch algorithm and float precision

I am currently trying to implement a Baum Welch algorithm in C, but I run into the following problem : the gamma function :我目前正在尝试用 C 实现 Baum Welch 算法,但我遇到了以下问题:伽马函数:

gamma(i,t) = alpha(i,t) * beta(i,t) / sum over `i` of(alpha(i,t) * beta(i,t))

Unfortunately, for large enough observation sets, alpha drops rapidly to 0 as t increases, and beta drops rapidly to 0 as t decreases, meaning that, due to rounding down, there is never a spot where both alpha and beta are non-zero, which makes things rather problematic.不幸的是,对于足够大的观察集,随着t增加,alpha 迅速下降到 0,随着t减少,beta 迅速下降到 0,这意味着,由于四舍五入,永远不会有 alpha 和 beta 都非零的地方,这使事情变得相当有问题。

Is there a way around this problem or should I just try to increase precision for the values?有没有办法解决这个问题,还是我应该尝试提高值的精度? I fear the problem may just pop up again if I try this approach, as alpha and beta drop of about one order of magnitude per observation.我担心如果我尝试这种方法,问题可能会再次出现,因为每次观察的 alpha 和 beta 下降大约一个数量级。

You should do these computations, and generally all computations for probability models, in log-space: 您应该在对数空间中进行以下计算,并且通常应进行概率模型的所有计算:

lg_gamma(i, t) = (lg_alpha(i, t) + lg_beta(i, t)
                  - logsumexp over i of (lg_alpha(i, t) + lg_beta(i, t)))

where lg_gamma(i, t) represents the logarithm of gamma(i, t) , etc., and logsumexp is the function described here . 其中lg_gamma(i, t)表示gamma(i, t)等的对数, logsumexp此处描述的函数 At the end of the computation, you can convert to probabilities using exp , if needed (that's typically only needed for displaying probabilities, but even there logs may be preferable). 在计算结束时,可以根据需要使用exp转换为概率(通常只需要显示概率,但是甚至最好有日志)。

The base of the logarithm is not important, as long as you use the same base everywhere. 对数的底数并不重要,只要您在各处使用相同的底数即可。 I prefer the natural logarithm, because log saves typing compared to log2 :) 我更喜欢自然对数,因为log相比,节省了打字log2 :)

I think you should do the scaling procedure for each observation that makes alpha and beta significantly less than one. 我认为您应该对使alphabeta显着小于1的每个观察结果执行缩放程序。 it will multiply a coefficient to alpha and beta that keeps them in a comparable bound. 它将系数乘以alphabeta ,使其保持可比范围。
you should multiply a coefficient, say c to each alpha variable that keeps it in a comparable bound, this c should be in the form of : 您应该将系数乘以c使每个alpha变量保持在可比较的范围内,该c的形式应为:

c(t) = 1 / sum(alpha(t,i)) , i=1... number of states , t=time step ( observation)

note that at each time step you will compute a c(t) will multiply to all alpha 's for all states on that time step. 请注意,在每个时间步长上,您都将计算c(t)乘以该时间步长上所有状态的所有alpha next do same procedure for beta 's. 接下来对beta执行相同的步骤。

there is a good tutorial about HMM that explains this procedure good enough : a tutorial on hidden markov models and selected applications in speech recognition (rabiner 1989) 关于HMM的很好的教程可以很好地解释此过程: 有关隐藏的markov模型和语音识别中所选应用程序的教程(rabiner 1989)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM