简体繁体 English

Baum Welch（EM算法）似然性（P（X））不是单调收敛

[英]Baum Welch (EM Algorithm) likelihood (P(X)) is not monotonically converging

原文 2018-01-17 22:26:23 4 1 python/ algorithm/ machine-learning/ nlp/ expectation-maximization

So I am sort of an amateur when comes to machine learning and I am trying to program the Baum Welch algorithm, which is a derivation of the EM algorithm for Hidden Markov Models. 因此，在机器学习方面，我是一个业余爱好者，我正在尝试对Baum Welch算法进行编程，该算法是针对隐马尔可夫模型的EM算法的衍生。 Inside my program I am testing for convergence using the probability of each observation sequence in the new model and then terminating once the new model is less than or equal to the old model. 在我的程序中，我正在使用新模型中每个观察序列的概率测试收敛性，然后在新模型小于或等于旧模型时终止。 However, when I run the algorithm it seems to converge somewhat and gives results that are far better than random but when converging it goes down on the last iteration . 但是，当我运行该算法时，它似乎收敛了一些，并且给出的结果远好于随机结果，但是收敛时，它在最后一次迭代中下降了 。 Is this a sign of a bug or am I doing something wrong? 这是错误的征兆还是我做错了什么？

It seems to me that I should have been using the summation of the log of each observation's probability for the comparison instead since it seems like the function I am maximizing. 在我看来，我应该一直使用每个观察值的概率的对数的总和来进行比较，因为这似乎是我正在使函数最大化。 However, the paper I read said to use the log of the sum of probabilities (which I am pretty sure is the same as the sum of the probabilities) of the observations( https://www.cs.utah.edu/~piyush/teaching/EM_algorithm.pdf ). 但是，我读过的论文说使用观测值的概率之和的对数 （我敢肯定与概率之和相同）（ https://www.cs.utah.edu/~piyush /teaching/EM_algorithm.pdf ）。

I fixed this on another project where I implemented backpropogation with feed-forward neural nets by implementing a for loop with pre-set number of epochs instead of a while loop with a condition for the new iteration to be strictly greater than but I am wondering if this is a bad practice. 我在另一个项目上解决了这个问题，在该项目中，我通过使用具有预先设定的时期数的for循环而不是while循环来实现带有前馈神经网络的反向传播，条件是新迭代必须严格大于，但是我想知道是否这是一个坏习惯。

My code is at https://github.com/icantrell/Natural-Language-Processing inside the nlp.py file. 我的代码位于nlp.py文件中的https://github.com/icantrell/Natural-Language-Processing 。

Any advice would be appreciated. 任何意见，将不胜感激。 Thank You. 谢谢。

1 个解决方案

For EM iterations, or any other iteration proved to be non-decreasing, you should be seeing increases until the size of increases becomes small compared with floating point error, at which time floating point errors violate the assumptions in the proof, and you may see not only a failure to increase, but a very small decrease - but this should only be very small. 对于EM迭代，或证明其他任何迭代都没有减少的迭代，您应该会看到增加，直到与浮点误差相比增加的大小变小为止，此时浮点误差违反了证明中的假设，您可能会看到不仅增加失败，而且减少幅度很小-但这应该很小。

One good way to check these sorts of probability based calculations is to create a small test problem where the right answer is glaringly obvious - so obvious that you can see whether the answers from the code under test are obviously correct at all. 检查这些基于概率的计算的一种好方法是创建一个小的测试问题，在该问题上显而易见的正确答案非常明显，以至于您可以看到被测代码的答案是否显然完全正确。

It might be worth comparing the paper you reference with https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm#Proof_of_correctness . 可能值得将您参考的论文与https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm#Proof_of_correctness进行比较。 I think equations such as (11) and (12) are not intended for you to actually calculate, but as arguments to motivate and prove the final result. 我认为方程式（11）和（12）并非供您实际计算，而是作为激励和证明最终结果的参数。 I think the equation corresponding to the traditional EM step, which you do calculate, is equation (15) which says that you change the parameters at each step to increase the expected log-likelihood, which is the expectation under the distribution of hidden states calculated according to the old parameters, which is the standard EM step. 我认为您要计算的与传统EM步骤相对应的方程式是方程式（15），该方程式表示您在每一步更改参数以增加预期的对数似然性，这是在计算的隐藏状态分布下的预期根据旧参数，这是标准的EM步骤。 In fact, turning over I see this is stated explicitly at the top of P 8. 实际上，翻身我看到这是在P 8的顶部明确声明的。