简体繁体 English

Python动态编程-最佳顺序

[英]Dynamic Programming in Python - Optimal sequence

原文 2016-05-17 13:46:21 8 1 python/ dynamic-programming

I am trying to solve this problem and unable to come up with a robust solution. 我正在尝试解决此问题，并且无法提出一个可靠的解决方案。 Any idea, pseudo-code or a python implementation would be greatly appreciated. 任何想法，伪代码或python实现将不胜感激。 For sake of simplicity, consider a small matrix like in Figure 1. The rows in the matrix represent days and the columns represent minutes. 为简单起见，请考虑一个如图1所示的小矩阵。矩阵中的行表示天，而列表示分钟。 We can assume that a bus travels between two points that takes 10 minutes and stops at a particular cell defined by a letter in that cell at each minute. 我们可以假设公共汽车在两点之间行驶，耗时10分钟，并且每分钟在该单元格中由字母定义的特定单元处停靠。 Given the historical pattern (day 1 thru 5), we want to find the best sequence of letters. 根据历史模式（第1天到第5天），我们希望找到字母的最佳顺序。 To do that we need to follow certain rules: 为此，我们需要遵循某些规则：

We want to select the most frequently observed letter per minute interval. 我们想选择每分钟间隔最频繁观察到的字母。 If there is more than one letter with the same frequency, we can select any of them. 如果有多个相同频率的字母，我们可以选择其中一个。
We want to maintain the continuity. 我们要保持连续性。
We want to preserve the original sequence the best we can. 我们希望尽可能地保留原始序列。

We are not looking for the shortest distance (most straight line, etc.) 我们并不是在寻找最短的距离（最直线等）。

Here are a couple examples: 这是几个例子： The sequence in Figure 1 satisfies all these rules. 图1中的序列满足所有这些规则。 The highlighted sequence is just for visualization purpose. 突出显示的序列仅用于可视化目的。 There are other ways of visualizing this sequence in Figure 1. 还有其他方法可以可视化图1中的序列。

The sequence in Figure 2 is discontinuous. 图2中的序列是不连续的。 Hence the most frequent letters can't be stitched together. 因此，最常用的字母不能拼接在一起。 For that reason, we select the second most frequent letter in minute 3, one of the C, A, D instead of B. With that we can satisfy the rules. 因此，我们在第3分钟选择第二个最频繁出现的字母，即C，A，D之一而不是B。这样我们就可以满足规则。 However, keep in mind, when 365 days used along with 100+ minutes, it gets complex. 但是，请记住，当365天使用100分钟以上时，情况变得很复杂。 For instance, using the second most frequent letter may have resulted in rewiring the rest of the sequence. 例如，使用第二个最频繁出现的字母可能会导致重新连接其余序列。

Any guidance is highly appreciated. 任何指导都受到高度赞赏。

1 个解决方案

This sounds like a relative straightforward dynamic programming task. 这听起来像一个相对简单的动态编程任务。

Start at the end: each cell in the last column gets 0 if it is the most frequent letter or 1 otherwise. 从末尾开始：如果最频繁的字母，则最后一列中的每个单元格都为0，否则为1。
Move on to the second last column. 移至第二列。 Each cell gets 0 if it is the most frequent letter or 1 other + min(cell_above, cell_directly_right, cell_below). 如果每个单元格是最频繁出现的字母，则为0；否则为其他1 +分钟（cell_above，cell_direct_right，cell_below）。 Note which cell you selected. 请注意您选择的单元格。
Repeat until you reach the end. 重复直到结束。
You will now have in the first column one or more cells with minimal value. 现在，您将在第一列中具有一个或多个最小值的单元格。 Follow the cells you noted in step 2. 遵循在步骤2中记下的单元格。
You now have a path from the beginning to the end which is continous and minimizes sum([0 if cell.most_frequent else 1 for cell in cells]) 现在，您具有从头到尾的连续路径，并且该路径是连续的，并且将sum([0 if cell.most_frequent else 1 for cell in cells])最小化sum([0 if cell.most_frequent else 1 for cell in cells])

You might have to tweak the target function eg the last most frequent and the second most frequent letter are treated the same. 您可能需要调整目标函数，例如将最后一个最频繁的字母和第二个最频繁的字母视为相同。 Maybe you want to give a score based on how frequent they are. 也许您想根据他们的频率给一个分数。