[英]Finding difference between strings in Javascript
I'd like to compare two strings (a before and after) and detect exactly where and what changed between them. 我想比较两个字符串(前后一个字符串),并准确检测它们之间的位置和变化。
For any change, I want to know: 对于任何更改,我想知道:
Assume that strings will change in only one place at a time (for example, never " B il l " -> " K il n "). 假设字符串一次只能更改一个位置(例如,从不“ B il l ”->“ K il n ”)。
Additionally, I need the start and end positions to reflect the type of change: 另外,我需要开始和结束位置以反映更改的类型:
For example: 例如:
"0123456789" -> "03456789"
Start: 1, End: 2, Change: "" (deletion)
"03456789" -> "0123456789"
Start: 1, End: 1, Change: "12" (insertion)
"Hello World!" -> "Hello Aliens!"
Start: 6, End: 10, Change: "Aliens" (replacement)
"Hi" -> "Hi"
Start: 0, End: 0, Change: "" (no change)
I was able to somewhat detect the positions of the changed text, but it doesn't work in all cases because in order to do that accurately, I need to know what kind of change is made. 我能够某种程度地检测到更改后的文本的位置,但是它不能在所有情况下都起作用,因为要准确地执行此操作,我需要知道进行了哪种更改。
var OldText = "My edited string!";
var NewText = "My first string!";
var ChangeStart = 0;
var NewChangeEnd = 0;
var OldChangeEnd = 0;
console.log("Comparing start:");
for (var i = 0; i < NewText.length; i++) {
console.log(i + ": " + NewText[i] + " -> " + OldText[i]);
if (NewText[i] != OldText[i]) {
ChangeStart = i;
break;
}
}
console.log("Comparing end:");
// "Addition"?
if (NewText.length > OldText.length) {
for (var i = 1; i < NewText.length; i++) {
console.log(i + "(N: " + (NewText.length - i) + " O: " + (OldText.length - i) + ": " + NewText.substring(NewText.length - i, NewText.length - i + 1) + " -> " + OldText.substring(OldText.length - i, OldText.length - i + 1));
if (NewText.substring(NewText.length - i, NewText.length - i + 1) != OldText.substring(OldText.length - i, OldText.length - i + 1)) {
NewChangeEnd = NewText.length - i;
OldChangeEnd = OldText.length - i;
break;
}
}
// "Deletion"?
} else if (NewText.length < OldText.length) {
for (var i = 1; i < OldText.length; i++) {
console.log(i + "(N: " + (NewText.length - i) + " O: " + (OldText.length - i) + ": " + NewText.substring(NewText.length - i, NewText.length - i + 1) + " -> " + OldText.substring(OldText.length - i, OldText.length - i + 1));
if (NewText.substring(NewText.length - i, NewText.length - i + 1) != OldText.substring(OldText.length - i, OldText.length - i + 1)) {
NewChangeEnd = NewText.length - i;
OldChangeEnd = OldText.length - i;
break;
}
}
// Same length...
} else {
// Do something
}
console.log("Change start: " + ChangeStart);
console.log("NChange end : " + NewChangeEnd);
console.log("OChange end : " + OldChangeEnd);
console.log("Change: " + OldText.substring(ChangeStart, OldChangeEnd + 1));
How do I tell whether or not an insertion, deletion, or replacement took place? 我如何知道是否进行了插入,删除或替换?
I've searched and came up with a few other similar questions, but they don't seem to help. 我已经搜查,并用想出了一些 其他类似的问题,但他们似乎并不帮忙做。
I have gone through your code and your logic for matching string makes sense to me. 我已经遍历了您的代码,并且您匹配字符串的逻辑对我来说很有意义。 It logs
ChangeStart
, NewChangeEnd
and OldChangeEnd
correctly and the algorithm flows alright. 它记录
ChangeStart
, NewChangeEnd
和OldChangeEnd
正确的算法流程好吗。 You just want to know if an insertion , deletion or replacement took place. 您只想知道是否发生了插入 , 删除或替换 。 Here's how I would go about it.
这就是我要做的。
First of all, you need to make sure that after you have got the first point of mis-match ie ChangeStart
when you then traverse the strings from the end, the index shouldn't cross ChangeStart
. 首先,您需要确保在出现不匹配的第一个点(即
ChangeStart
之后再从头开始遍历字符串时,索引不应越过ChangeStart
。
I'll give you an example. 我举一个例子。 Consider the following strings:
考虑以下字符串:
var NewText = "Hello Worllolds!";
var OldText = "Hello Worlds!";
ChangeStart -> 10 //Makes sense
OldChangeEnd -> 8
NewChangeEnd -> 11
console.log("Change: " + NewText.substring(ChangeStart, NewChangeEnd + 1));
//Ouputs "lo"
The problem in this case is when it starts matching from the back, the flow is something like this: 在这种情况下,问题是当它从背面开始匹配时,流程如下所示:
Comparing end:
1(N: 12 O: 12: ! -> !)
2(N: 11 O: 11: s -> s)
3(N: 10 O: 10: d -> d) -> You need to stop here!
//Although there is not a mismatch, but we have reached ChangeStart and
//we have already established that characters from 0 -> ChangeStart-1 match
//That is why it outputs "lo" instead of "lol"
Assuming, what I just said makes sense, you just need to modify your for
loops like so: 假设,我刚才说的很有意义,您只需要像下面这样修改
for
循环:
if (NewText.length > OldText.length) {
for (var i = 1; i < NewText.length && ((OldText.length-i)>=ChangeStart); i++) {
...
NewChangeEnd = NewText.length - i -1;
OldChangeEnd = OldText.length - i -1;
if(//Mismatch condition reached){
//break..That code is fine.
}
}
This condition -> (OldText.length-i)>=ChangeStart
takes care of the anomaly that I mentioned and therefore the for
loop automatically terminates if this condition is reached. 此条件->
(OldText.length-i)>=ChangeStart
可以解决我提到的异常,因此,如果达到此条件,则for
循环会自动终止。 However, just as I mentioned there might be situations where this condition is reached before a mis-match is encountered like I just demonstrated. 但是,就像我提到的那样,在某些情况下,如我刚刚演示的那样,在遇到不匹配之前可能已经达到此条件。 So you need to update values of
NewChangeEnd
and OldChangeEnd
as 1 less than the matched value. 因此,您需要将
NewChangeEnd
和OldChangeEnd
值更新为小于匹配值的1。 In case of a mis-match, you store the values appropriately. 如果不匹配,请适当存储值。
Instead of an else -if
we could just wrap those two conditions in a situation where we know NewText.length > OldText.length
is definitely not true ie it is either a replacement or a deletion . 而不是
else -if
我们只能在我们知道NewText.length > OldText.length
绝对不是 true的情况下包装这两个条件,即它要么是替换项,要么是删除项 。 Again NewText.length > OldText.length
also means it could be a replacement or an insertion as per your examples, which makes sense. 同样,按照您的示例,
NewText.length > OldText.length
也可以替换或插入 ,这很有意义。 So the else
could be something like: 所以
else
可能是这样的:
else {
for (var i = 1; i < OldText.length && ((OldText.length-i)>=ChangeStart); i++) {
...
NewChangeEnd = NewText.length - i -1;
OldChangeEnd = OldText.length - i -1;
if(//Mismatch condition reached){
//break..That code is fine.
}
}
If you have understood the minor changes thus far, identifying the specific cases is really simple: 如果您到目前为止已经了解了微小的变化,那么确定具体情况确实很简单:
ChangeStart > NewChangeEnd
. ChangeStart > NewChangeEnd
。 Deleted string from ChangeStart -> OldChangeEnd
. ChangeStart -> OldChangeEnd
删除的字符串。 Deleted text -> OldText.substring(ChangeStart, OldChangeEnd + 1);
删除的文本->
OldText.substring(ChangeStart, OldChangeEnd + 1);
ChangeStart > OldChangeEnd
. ChangeStart > OldChangeEnd
。 Inserted string at ChangeStart
. ChangeStart
处插入了字符串。 Inserted text -> NewText.substring(ChangeStart, NewChangeEnd + 1);
插入的文字->
NewText.substring(ChangeStart, NewChangeEnd + 1);
NewText != OldText
and the above two conditions are not met, then it is a replacement. NewText != OldText
和上述两个条件都不满足,那么它是一个替代品。 Text in old string that got replaced -> OldText.substring(ChangeStart, OldChangeEnd + 1);
旧字符串中已替换的文本->
OldText.substring(ChangeStart, OldChangeEnd + 1);
The replacement text -> NewText.substring(ChangeStart, NewChangeEnd + 1);
替换文本->
NewText.substring(ChangeStart, NewChangeEnd + 1);
Start and end positions in the OldText
that got replaced -> ChangeStart -> OldChangeEnd
已替换的
OldText
中的开始和结束位置ChangeStart -> OldChangeEnd
I have created a jsfiddle incorporating the changes that I have mentioned in your code. 我创建了一个jsfiddle,其中包含了我在您的代码中提到的更改。 You might want to check it out.
您可能需要检查一下。 Hope it gets you started in the right direction.
希望它能帮助您朝正确的方向开始。
I had a similar problem and solved it with the following: 我遇到了类似的问题,并通过以下方法解决了问题:
function diff(oldText, newText) {
// Find the index at which the change began
var s = 0;
while(s < oldText.length && s < newText.length && oldText[s] == newText[s]) {
s++;
}
// Find the index at which the change ended (relative to the end of the string)
var e = 0;
while(e < oldText.length &&
e < newText.length &&
oldText.length - e > s &&
newText.length - e > s &&
oldText[oldText.length - 1 - e] == newText[newText.length - 1 - e]) {
e++;
}
// The change end of the new string (ne) and old string (oe)
var ne = newText.length - e;
var oe = oldText.length - e;
// The number of chars removed and added
var removed = oe - s;
var added = ne - s;
var type;
switch(true) {
case removed == 0 && added > 0: // It's an 'add' if none were removed and at least 1 added
type = 'add';
break;
case removed > 0 && added == 0: // It's a 'remove' if none were added and at least one removed
type = 'remove';
break;
case removed > 0 && added > 0: // It's a replace if there were both added and removed characters
type = 'replace';
break;
default:
type = 'none'; // Otherwise there was no change
s = 0;
}
return { type: type, start: s, removed: removed, added: added };
}
Note, this didn't solve my actual problem though. 注意,这并不能解决我的实际问题。 My issue was that I had an editor with paragraphs, each modelled with text and a collection of markups defined with a start and end index eg bold from char 1 to char 5. I was using this to detect changes to the string so I could shift the markup indices accordingly.
我的问题是我有一个带有段落的编辑器,每个段落都以文本为模型,并用开始和结束索引定义了标记的集合,例如从char 1到char 5的粗体。我用它来检测字符串的变化,以便我可以转换标记索引相应地。 But consider the string:
但是考虑一下字符串:
xx xxxx xx xx xxxx xx
The diff function approach can't tell the difference between a character added outside the bold or within it. diff函数方法无法分辨加在粗体之外或之内的字符之间的区别。
In the end, I took a completely different approach - I just parsed the HTML produced by the editor and used that to determine start and end indices of markups. 最后,我采用了一种完全不同的方法-我只是解析了编辑器生成的HTML,并将其用于确定标记的开始和结束索引。
Made my own slightly more performant version inspired by the same tactics as above (looking for differences from front to back and back to front) 受到与上述相同的策略的启发,使自己的性能版本略有提高(寻找前后,前后之间的差异)
function compareText(oldText, newText)
{
var difStart,difEndOld,difEndNew;
//from left to right - look up the first index where characters are different
for(let i=0;i<oldText.length;i++)
{
if(oldText.charAt(i) !== newText.charAt(i))
{
difStart = i;
break;
}
}
//from right to left - look up the first index where characters are different
//first calc the last indices for both strings
var oldMax = oldText.length - 1;
var newMax = newText.length - 1;
for(let i=0;i<oldText.length;i++)
{
if(oldText.charAt(oldMax-i) !== newText.charAt(newMax-i))
{
//with different string lengths, the index will differ for the old and the new text
difEndOld = oldMax-i;
difEndNew = newMax-i;
break;
}
}
var removed = oldText.substr(difStart,difEndOld-difStart+1);
var added = newText.substr(difStart,difEndNew-difStart+1);
return [difStart,added,removed];
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.