简体   繁体   English

在JavaScript中找出字串之间的差异

[英]Finding difference between strings in Javascript

I'd like to compare two strings (a before and after) and detect exactly where and what changed between them. 我想比较两个字符串(前后一个字符串),并准确检测它们之间的位置和变化。

For any change, I want to know: 对于任何更改,我想知道:

  1. Starting position of the change (inclusive, starting at 0) 更改的起始位置(包括0)
  2. Ending position of the change (inclusive, starting at 0) relative to the previous text 相对于上一文本的更改的结束位置(包括0,从0开始)
  3. The "change" 改变”

Assume that strings will change in only one place at a time (for example, never " B il l " -> " K il n "). 假设字符串一次只能更改一个位置(例如,从不“ B il l ”->“ K il n ”)。

Additionally, I need the start and end positions to reflect the type of change: 另外,我需要开始和结束位置以反映更改的类型:

  • If deletion, the start and end position should be the start and end positions of the deleted text, respectively 如果删除,则开始和结束位置应分别是已删除文本的开始和结束位置
  • If replacement, the start and end position should be the start and end positions of the "deleted" text, respectively (the change will be the "added" text) 如果要替换,则开始和结束位置应分别是“已删除”文本的开始和结束位置(更改将是“添加”文本)
  • If insertion, the start and end positions should be the same; 如果插入,则开始和结束位置应该相同; the entry point of the text 文本的入口点
  • If no change, let start and end positions remain zero, with an empty change 如果没有变化,则将起始位置和结束位置保持为零,并进行空更改

For example: 例如:

"0123456789" -> "03456789"  
Start: 1, End: 2, Change: "" (deletion)

"03456789" -> "0123456789"  
Start: 1, End: 1, Change: "12" (insertion)

"Hello World!" -> "Hello Aliens!"  
Start: 6, End: 10, Change: "Aliens" (replacement)

"Hi" -> "Hi"  
Start: 0, End: 0, Change: "" (no change)

I was able to somewhat detect the positions of the changed text, but it doesn't work in all cases because in order to do that accurately, I need to know what kind of change is made. 我能够某种程度地检测到更改后的文本的位置,但是它不能在所有情况下都起作用,因为要准确地执行此操作,我需要知道进行了哪种更改。

var OldText = "My edited string!";
var NewText = "My first string!";

var ChangeStart = 0;
var NewChangeEnd = 0;
var OldChangeEnd = 0;
console.log("Comparing start:");
for (var i = 0; i < NewText.length; i++) {
    console.log(i + ": " + NewText[i] + " -> " + OldText[i]);
    if (NewText[i] != OldText[i]) {
        ChangeStart = i;
        break;
    }
}
console.log("Comparing end:");
// "Addition"?
if (NewText.length > OldText.length) {
    for (var i = 1; i < NewText.length; i++) {
        console.log(i + "(N: " + (NewText.length - i) + " O: " + (OldText.length - i) + ": " + NewText.substring(NewText.length - i, NewText.length - i + 1) + " -> " + OldText.substring(OldText.length - i, OldText.length - i + 1));
        if (NewText.substring(NewText.length - i, NewText.length - i + 1) != OldText.substring(OldText.length - i, OldText.length - i + 1)) {
            NewChangeEnd = NewText.length - i;
            OldChangeEnd = OldText.length - i;
            break;
        }
    }
// "Deletion"?
} else if (NewText.length < OldText.length) {
    for (var i = 1; i < OldText.length; i++) {
        console.log(i + "(N: " + (NewText.length - i) + " O: " + (OldText.length - i) + ": " + NewText.substring(NewText.length - i, NewText.length - i + 1) + " -> " + OldText.substring(OldText.length - i, OldText.length - i + 1));
        if (NewText.substring(NewText.length - i, NewText.length - i + 1) != OldText.substring(OldText.length - i, OldText.length - i + 1)) {
            NewChangeEnd = NewText.length - i;
            OldChangeEnd = OldText.length - i;
            break;
        }
    }
// Same length...
} else {
    // Do something
}
console.log("Change start: " + ChangeStart);
console.log("NChange end : " + NewChangeEnd);
console.log("OChange end : " + OldChangeEnd);
console.log("Change: " + OldText.substring(ChangeStart, OldChangeEnd + 1));

How do I tell whether or not an insertion, deletion, or replacement took place? 我如何知道是否进行了插入,删除或替换?


I've searched and came up with a few other similar questions, but they don't seem to help. 我已经搜查,并用想出了一些 其他类似的问题,但他们似乎并不帮忙做。

I have gone through your code and your logic for matching string makes sense to me. 我已经遍历了您的代码,并且您匹配字符串的逻辑对我来说很有意义。 It logs ChangeStart , NewChangeEnd and OldChangeEnd correctly and the algorithm flows alright. 它记录ChangeStartNewChangeEndOldChangeEnd正确的算法流程好吗。 You just want to know if an insertion , deletion or replacement took place. 您只想知道是否发生了插入删除替换 Here's how I would go about it. 这就是我要做的。

First of all, you need to make sure that after you have got the first point of mis-match ie ChangeStart when you then traverse the strings from the end, the index shouldn't cross ChangeStart . 首先,您需要确保在出现不匹配的第一个点(即ChangeStart之后再从头开始遍历字符串时,索引不应越过ChangeStart

I'll give you an example. 我举一个例子。 Consider the following strings: 考虑以下字符串:

 var NewText = "Hello Worllolds!";
 var OldText = "Hello Worlds!";

 ChangeStart -> 10 //Makes sense
 OldChangeEnd -> 8
 NewChangeEnd -> 11

 console.log("Change: " + NewText.substring(ChangeStart, NewChangeEnd + 1)); 
 //Ouputs "lo"

The problem in this case is when it starts matching from the back, the flow is something like this: 在这种情况下,问题是当它从背面开始匹配时,流程如下所示:

 Comparing end: 
  1(N: 12 O: 12: ! -> !) 
  2(N: 11 O: 11: s -> s) 
  3(N: 10 O: 10: d -> d)  -> You need to stop here!

 //Although there is not a mismatch, but we have reached ChangeStart and 
 //we have already established that characters from 0 -> ChangeStart-1 match
 //That is why it outputs "lo" instead of "lol"

Assuming, what I just said makes sense, you just need to modify your for loops like so: 假设,我刚才说的很有意义,您只需要像下面这样修改for循环:

 if (NewText.length > OldText.length) {
 for (var i = 1; i < NewText.length && ((OldText.length-i)>=ChangeStart); i++) {
  ...

    NewChangeEnd = NewText.length - i -1;
    OldChangeEnd = OldText.length - i -1;
  if(//Mismatch condition reached){
         //break..That code is fine.
    }
 }

This condition -> (OldText.length-i)>=ChangeStart takes care of the anomaly that I mentioned and therefore the for loop automatically terminates if this condition is reached. 此条件-> (OldText.length-i)>=ChangeStart可以解决我提到的异常,因此,如果达到此条件,则for循环会自动终止。 However, just as I mentioned there might be situations where this condition is reached before a mis-match is encountered like I just demonstrated. 但是,就像我提到的那样,在某些情况下,如我刚刚演示的那样,在遇到不匹配之前可能已经达到此条件。 So you need to update values of NewChangeEnd and OldChangeEnd as 1 less than the matched value. 因此,您需要将NewChangeEndOldChangeEnd值更新为小于匹配值的1。 In case of a mis-match, you store the values appropriately. 如果不匹配,请适当存储值。

Instead of an else -if we could just wrap those two conditions in a situation where we know NewText.length > OldText.length is definitely not true ie it is either a replacement or a deletion . 而不是else -if我们只能在我们知道NewText.length > OldText.length绝对不是 true的情况下包装这两个条件,即它要么是替换项,要么是删除项 Again NewText.length > OldText.length also means it could be a replacement or an insertion as per your examples, which makes sense. 同样,按照您的示例, NewText.length > OldText.length也可以替换插入 ,这很有意义。 So the else could be something like: 所以else可能是这样的:

else {
for (var i = 1; i < OldText.length && ((OldText.length-i)>=ChangeStart); i++) { 

    ...
    NewChangeEnd = NewText.length - i -1;
    OldChangeEnd = OldText.length - i -1;
  if(//Mismatch condition reached){
         //break..That code is fine.
    }
 }

If you have understood the minor changes thus far, identifying the specific cases is really simple: 如果您到目前为止已经了解了微小的变化,那么确定具体情况确实很简单:

  1. Deletion - Condition -> ChangeStart > NewChangeEnd . 删除 -条件- ChangeStart > NewChangeEnd Deleted string from ChangeStart -> OldChangeEnd . ChangeStart -> OldChangeEnd删除的字符串。

Deleted text -> OldText.substring(ChangeStart, OldChangeEnd + 1); 删除的文本-> OldText.substring(ChangeStart, OldChangeEnd + 1);

  1. Insertion - Condition -> ChangeStart > OldChangeEnd . 插入 -条件- ChangeStart > OldChangeEnd Inserted string at ChangeStart . ChangeStart处插入了字符串。

Inserted text -> NewText.substring(ChangeStart, NewChangeEnd + 1); 插入的文字-> NewText.substring(ChangeStart, NewChangeEnd + 1);

  1. Replacement - If NewText != OldText and the above two conditions are not met, then it is a replacement. 更换 -如果NewText != OldText和上述两个条件都不满足,那么它一个替代品。

Text in old string that got replaced -> OldText.substring(ChangeStart, OldChangeEnd + 1); 旧字符串中已替换的文本-> OldText.substring(ChangeStart, OldChangeEnd + 1);

The replacement text -> NewText.substring(ChangeStart, NewChangeEnd + 1); 替换文本-> NewText.substring(ChangeStart, NewChangeEnd + 1);

Start and end positions in the OldText that got replaced -> ChangeStart -> OldChangeEnd 替换OldText中的开始和结束位置ChangeStart -> OldChangeEnd

I have created a jsfiddle incorporating the changes that I have mentioned in your code. 我创建了一个jsfiddle,其中包含了我在您的代码中提到的更改。 You might want to check it out. 您可能需要检查一下。 Hope it gets you started in the right direction. 希望它能帮助您朝正确的方向开始。

I had a similar problem and solved it with the following: 我遇到了类似的问题,并通过以下方法解决了问题:

function diff(oldText, newText) {

  // Find the index at which the change began
  var s = 0;
  while(s < oldText.length && s < newText.length && oldText[s] == newText[s]) {
    s++;
  }

  // Find the index at which the change ended (relative to the end of the string)
  var e = 0;
  while(e < oldText.length &&
        e < newText.length &&
        oldText.length - e > s &&
        newText.length - e > s &&
        oldText[oldText.length - 1 - e] == newText[newText.length - 1 - e]) {
    e++;
  }

  // The change end of the new string (ne) and old string (oe)
  var ne = newText.length - e;
  var oe = oldText.length - e;

  // The number of chars removed and added
  var removed = oe - s;
  var added = ne - s;

  var type;
  switch(true) {
    case removed == 0 && added > 0:  // It's an 'add' if none were removed and at least 1 added
      type = 'add';
      break;
    case removed > 0 && added == 0:  // It's a 'remove' if none were added and at least one removed
      type = 'remove';
      break;
    case removed > 0 && added > 0:   // It's a replace if there were both added and removed characters
      type = 'replace';
      break;
    default:
      type = 'none';                 // Otherwise there was no change
      s = 0;
  }

  return { type: type, start: s, removed: removed, added: added };
}

Note, this didn't solve my actual problem though. 注意,这并不能解决我的实际问题。 My issue was that I had an editor with paragraphs, each modelled with text and a collection of markups defined with a start and end index eg bold from char 1 to char 5. I was using this to detect changes to the string so I could shift the markup indices accordingly. 我的问题是我有一个带有段落的编辑器,每个段落都以文本为模型,并用开始和结束索引定义了标记的集合,例如从char 1到char 5的粗体。我用它来检测字符串的变化,以便我可以转换标记索引相应地。 But consider the string: 但是考虑一下字符串:

xx xxxx xx xx xxxx xx

The diff function approach can't tell the difference between a character added outside the bold or within it. diff函数方法无法分辨加在粗体之外或之内的字符之间的区别。

In the end, I took a completely different approach - I just parsed the HTML produced by the editor and used that to determine start and end indices of markups. 最后,我采用了一种完全不同的方法-我只是解析了编辑器生成的HTML,并将其用于确定标记的开始和结束索引。

Made my own slightly more performant version inspired by the same tactics as above (looking for differences from front to back and back to front) 受到与上述相同的策略的启发,使自己的性能版本略有提高(寻找前后,前后之间的差异)

function compareText(oldText, newText)
{
    var difStart,difEndOld,difEndNew;

    //from left to right - look up the first index where characters are different
    for(let i=0;i<oldText.length;i++)
    {
        if(oldText.charAt(i) !== newText.charAt(i))
        {
            difStart = i;
            break;
        }
    }

    //from right to left - look up the first index where characters are different
    //first calc the last indices for both strings
    var oldMax = oldText.length - 1;
    var newMax = newText.length - 1;
    for(let i=0;i<oldText.length;i++)
    {
        if(oldText.charAt(oldMax-i) !== newText.charAt(newMax-i))
        {
            //with different string lengths, the index will differ for the old and the new text
            difEndOld = oldMax-i;
            difEndNew = newMax-i;
            break;
        }
    }

    var removed = oldText.substr(difStart,difEndOld-difStart+1);
    var added = newText.substr(difStart,difEndNew-difStart+1);

    return [difStart,added,removed];
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM