简体   繁体   English

快速的字符串比较方式

[英]fast way for string comparison

I have a simple question but it makes me confused. 我有一个简单的问题,但它让我感到困惑。

I have two strings, and I want to count how many different characters between the two. 我有两个字符串,我想计算两者之间有多少不同的字符。 The strings are sorted, equal length. 字符串排序,长度相等。 Do not split the strings. 不要拆分字符串。

For example 例如

input:  abc, bcd
output: 2, because a and d are different characters

input:  abce, bccd
output: 4, because a, c, d and e are different.

I know I can do it in O(N^2), but how can I solve it in O(N) for these sorted strings? 我知道我可以在O(N ^ 2)中完成它,但是如何在O(N)中为这些排序的字符串解决它?

Only need the number of different characters, no need to indicate which number. 只需要不同字符的数量,无需指明哪个数字。

I was originally thinking that you needed a fairly complicated algorithm, like Smith-Waterman for example. 我原本以为你需要一个相当复杂的算法,例如Smith-Waterman But the restrictions on your input makes it fairly easy to implement this in O(m + n) , where m is the length of the first string, and n is the length of the second string. 但是对输入的限制使得在O(m + n)实现它很容易,其中m是第一个字符串的长度, n是第二个字符串的长度。

We can use a builtin algorithm to calculate the number of characters that are in common, and then we can use that information to produce the number you are looking for: 我们可以使用内置算法来计算共同的字符数,然后我们可以使用该信息来生成您要查找的数字:

#include <algorithm>
#include <iostream>
#include <string>

int main() {
    std::string m = "abce";
    std::string n = "bccd";
    std::string result;

    std::set_intersection(
            m.begin(), m.end(),
            n.begin(), n.end(),
            std::back_inserter(result));

    std::cout << m.size() + n.size() - 2 * result.size() << "\n";
}

In this particular case, it outputs 4 , as you wanted. 在这种特殊情况下,它会输出4 ,如您所愿。

After seeing how simple the answer really is, thanks to @Bill Lynch , my solution may be too complex! 在看到答案真的很简单之后,感谢@Bill Lynch,我的解决方案可能太复杂了! Anyways, its a simple counting-difference. 无论如何,它是一个简单的计数差异。

#include <iostream>
#include <algorithm>
#include <array>

int main() {
    std::array<int,26> str1 = {};
    std::array<int,26> str2 = {};

    std::string s1("abce");
    std::string s2("bccd");


    for(char c : s1)
        ++str1[c-'a'];
    for(char c : s2)
        ++str2[c-'a'];

    int index = 0;

    std::cout << std::count_if(str1.begin(),str1.end(),[&](int x)
    {
        return x != str2[index++];
    });
}

Its O(n+m) , unless I've made a mistake in the analysis. 它的O(n+m) ,除非我在分析中犯了错误。

you can achieve O(n) using dynamic programming . 你可以使用动态编程实现O(n)。 ie use an integer d for storing difference. 即使用整数d来存储差异。

Algo:
move from lower index to higher index of both array.  
if a[i] not equal b[j]:
           increase d by 2
           move the index of smaller array and check again.
if a[i] is equal to b[j] : 
           decrease d by 1
           move both index
repeat this until reach the end of array

O(2n) and O(n) are exactly the same thing, since the "O" indicates the asymptotic behavior for the cost of your method. O(2n)和O(n)完全相同,因为“O”表示方法成本的渐近行为。

Update : I just noticed you meant O(n^2) with your O(N2). 更新 :我刚注意到你的意思是你的O(N2)与O(n ^ 2)。

If you need to do that comparison, you'll always have O(n^2) as your cost, since you have to: 如果你需要做那个比较,你将总是有O(n ^ 2)作为你的费用,因为你必须:

1) Loop for every character of your words, and this is O(n) 1)为你的单词的每个字符循环 ,这是O(n)

2) Compare the current character in each word, and you'll have to use a temporary list that contains the characters you have already checked. 2) 比较每个单词中的当前字符,您必须使用包含已检查字符的临时列表。 So, this is another nested O(n). 所以,这是另一个嵌套的O(n)。

So, O(n) * O(n) = O(n^2) . 因此,O(n)* O(n)= O(n ^ 2)

Note : you can always ignore a numeric coefficient inside your O expression, as it doesn't matter. 注意 :您始终可以忽略O表达式中的数字系数,因为它无关紧要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM