简体繁体 English

使用动态编程时，要捕获整个路径的最小和？

[英]When using dynamic programming, capturing the entire path for a min-sum?

原文 2015-07-16 15:21:55 4 2 algorithm/ nodes/ dynamic-programming/ viterbi

I am trying to use the Viterbi min-sum algorithm which tries to find the pathway through a bunch of nodes that minimizes the overall Hamming distance (fancy term for "xor two numbers and count the resulting bits") against some fixed input. 我正在尝试使用维特比最小和算法，该算法尝试通过一堆节点找到一条路径，该路径将针对某些固定输入的总汉明距离（“汉字乘以2并计算所得位数”的总汉明距离）最小化。

I understand find how to use DP to compute the minimal distance overall, but I am having trouble using it to also capture the corresponding path that corresponds to the minimal distance. 我知道找到如何使用DP来计算整体的最小距离，但我用它来捕捉也对应于最小距离对应的路径有问题。

It seems like memoizing the path at each node would be really memory-intensive. 似乎记住每个节点的路径确实会占用大量内存。 Is there a standard way to handle these kinds of problems? 是否存在处理此类问题的标准方法？

Edit: 编辑：

http://i.imgur.com/EugiEWG.jpg http://i.imgur.com/EugiEWG.jpg

Here is a sample trellis with what I am talking about. 这是我正在谈论的示例网格。 The general idea is to find the path through the trellis that most closely emulates the input bitstring, with minimal error (measured by minimizing overall Hamming distance, or the number of mismatched bits). 一般的想法是找到通过网格的路径，该路径最接近于模拟输入的位串，并且具有最小的误差（通过最小化整个汉明距离或失配位数来测量）。

As you can see, the first chunk of my input string is 01, and I can traverse there in column 1 of the trellis. 如您所见，输入字符串的第一块是01，我可以在网格的第1列中遍历。 The next chunk is 10, and I can move there in column 2. Next chunk is 11. Fine so far. 下一个块是10，我可以在第2列中移动。下一个块是11。到目前为止，还不错。 Next chunk is 10, which is a problem because I can't reach that state from where I am now, so I have to go to the next best thing (00) and the rest can be filled fine. 下一个块是10，这是一个问题，因为我无法从现在的位置到达该状态，因此我必须转到下一个最好的东西（00），其余的东西都可以填满。

But this can become more complex. 但这会变得更加复杂。 I'd need to be able to somehow get the corresponding path to the minimal Hamming distance. 我需要能够以某种方式获得到最小汉明距离的相应路径。

(The point of this exercise is that the trellis represents what are ACTUALLY valid transitions, whereas the input string is something you receive through telecommunicationa and might get garbled and have incorrect bits here and there. This program tries to figure out what the input string SHOULD be by minimizing error). （此练习的目的是，网格表示实际上是有效的过渡，而输入字符串是您通过telecoma接收到的，可能会出现乱码，并且在此处和此处有不正确的位。该程序试图找出输入字符串应该是什么通过最小化误差）。

2 个解决方案

There's the usual "follow path backwards" technique, requiring only the table of values (but the whole table of values, no cheating with "keep only the most recent part"). 有一种通常的“向后跟随路径”技术，仅需要值表（但需要整个值表，而无需作弊“只保留最近的部分”）。 The algorithm is simple: start at the end, decide which way you came from. 算法很简单：从头开始，确定您来自哪种方式。 You can make that decision, because either there's exactly one way such that if you came from it you'd compute the value that matches the stored one, or several result in the same value and it wouldn't matter which one you chose. 您可以做出决定，因为或者只有一种方法，如果您从中得出与存储的值相匹配的值，或者有几种方法得出的值相同，那么选择哪一个都不重要。

Storing also a table of "back-pointers" doesn't take much space (about as much as the table of weights, but you can actually omit most of the table of weights if you do this), doing it that way allows you to have a much simpler backwards phase: just follow the pointers. 同时存储“后备指针”表不会占用太多空间（大约比权重表大，但是如果您这样做，实际上可以忽略大多数权重表），这样做可以使您倒退阶段要简单得多：只需按照指针操作即可。 That really is the path, just stored backwards. 那确实是路径，只是向后存储。

You are correct that the immediate approach for calculating the paths, is space expensive. 您正确地认为，用于计算路径的直接方法会占用大量空间。

This problem comes up often in DNA sequencing , where the cost is prohibitive. 这个问题经常出现在DNA测序中，这种方法的成本高昂。 There are a number of ways to overcome it (see more here ): 有很多方法可以克服它（请参阅此处的更多信息）：

You can reduce up to a square root of the space if you are willing to double the execution time (see 2.1.1 in the link above). 如果您希望执行时间加倍，则可以减少空间的平方根（请参阅上面的链接中的2.1.1）。
Using a compressed tree, you can reduce one of the dimensions logarithmically (see 2.1.2 in the link above). 使用压缩树，您可以对数减少一个维度（请参阅上面链接中的2.1.2）。