简体   繁体   English

区分两个标识符序列

[英]Diff two sequences of identifier

Given two sequences of identifier, how to find the smallest operation sequence that will transform the first sequence of identifier to the second one. 给定两个标识符序列,如何找到将第一个标识符序列转换为第二个标识符序列的最小操作序列。

Operation can be : 操作可以是:

  • Insert an identifier at a given position 在给定位置插入标识符
  • Remove the identifier from a given position 从给定位置删除标识符
  • Move an identifier from a position to another 将标识符从一个位置移动到另一个位置

Note: identifiers are unique and can't appear twice in a sequence 注意:标识符是唯一的,不能在序列中出现两次

Example: 例:

Sequence1 [1, 2, 3, 4, 5]
Sequence2 [5, 1, 2, 9, 3, 7]

Result (index are 0 based) :
- Remove at 3
- Move from 3 to 0
- Insert '9' at 3
- Insert '7' at 5

Thanks ! 谢谢 !

This metric is called Levenshtein distance or more precisely Damerau–Levenshtein distance . 此度量标准称为Levenshtein距离,或更准确地称为Damerau–Levenshtein距离

There are implementations for almost every possible programming language, that you can use resolve the problem you described. 几乎每种可能的编程语言都有一些实现,您可以使用它们来解决所描述的问题。

Start by finding the longest common subsequence . 首先找到最长的公共子序列 This will identify the elements that will not move: 这将标识不会移动的元素:

[(1), (2), (3), 4, 5]

Elements of the LCS are enclosed in parentheses. LCS的元素括在括号中。

Go through both sequences from index 0, recording the operations required to make the sequences identical. 从索引0开始遍历两个序列,记录使序列相同所需的操作。 If the current item of the first sequence is not part of the LCS, remove it, and mark the place where it has been before, in case you need to insert it at a later time. 如果第一个序列的当前项目不是LCS的一部分,请将其删除,并标记它以前的位置,以防您以后需要插入它。 If the current element is part of the LCS, insert the element from the second sequence in front of it. 如果当前元素是LCS的一部分,则将第二个序列中的元素插入其前面。 This could be either a simple insertion, or a move. 这可以是简单的插入,也可以是移动。 If the item that you are inserting is in the original list, make it a move; 如果要插入的项目在原始列表中,请移动; otherwise, make it an insert. 否则,将其插入。

Here is a demo using your example. 这是一个使用您的示例的演示。 Curly braces show the current element 花括号显示当前元素

[{(1)}, (2), (3), 4, 5] vs [{5}, 1, 2, 9, 3, 7]

1 is a member of LCS, so we must insert 5 . 1是LCS的成员,因此我们必须插入5 5 is in the original sequence, so we record a move: MOVE 4 to 0 5是原始顺序,因此我们记录一个移动: MOVE 4 to 0

[5, {(1)}, (2), (3), 4] vs [5, {1}, 2, 9, 3, 7]

Items are the same, so we move on to the next one: 项目是相同的,因此我们继续进行下一个:

[5, (1), {(2)}, (3), 4] vs [5, 1, {2}, 9, 3, 7]

Again the numbers are the same - move to the next one: 同样,数字相同-移至下一个:

[5, (1), (2), {(3)}, 4] vs [5, 1, 2, {9}, 3, 7]

3 is a member of LCS, so we must insert 9 . 3是LCS的成员,因此我们必须插入9 The original element does not have 9 , so it's a simple insertion: INSERT 9 at 3 原始元素没有9 ,所以它很简单: INSERT 9 at 3

[5, (1), (2), 9, {(3)}, 4] vs [5, 1, 2, 9, {3}, 7]

Yet again the numbers are the same - move to the next one: 同样,数字相同-移至下一个:

[5, (1), (2), 9, (3), {4}] vs [5, 1, 2, 9, 3, {7}]

'4' is not a member of LCS, so it gets deleted: DEL at 5 '4'不是LCS的成员,因此将其删除: DEL at 5

[5, (1), (2), 9, (3)] vs [5, 1, 2, 9, 3, {7}]

We reached the end of the first sequence - we simply add the remaining items of the second sequence to the first one, paying attention to the list of prior deletions. 我们到达了第一个序列的末尾-我们只需将第二个序列的其余项添加到第一个序列中,请注意先前删除的列表。 For example, if 7 had been removed earlier, we would transform that deletion into a move at this time. 例如,如果之前删除了7 ,那么我们现在可以将该删除转变为移动。 But since the original list did not have 7 , we record our final operation: INS 7 at 5 . 但是由于原始列表中没有7 ,因此我们记录了最终操作: INS 7 at 5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM