简体   繁体   English

查找两个字符串中的公共字符

[英]Finding common characters in two strings

I am coding for the problem in which we got to count the number of common characters in two strings.我正在为我们必须计算两个字符串中常见字符数的问题编码。 Main part of the count goes like this计数的主要部分是这样的

for(i=0; i < strlen(s1); i++) {
    for(j = 0; j < strlen(s2); j++) {
        if(s1[i] == s2[j]) {
            count++;
            s2[j] = '*';
            break;
        }
    }
}

This goes with an O(n^2) logic.这符合 O(n^2) 逻辑。 However I could not think of a better solution than this.但是,我想不出比这更好的解决方案。 Can anyone help me in coding with an O(n) logic.任何人都可以帮助我使用 O(n) 逻辑进行编码。

This is very simple.这很简单。 Take two int arrays freq1 and freq2 .取两个int数组freq1freq2 Initialize all its elements to 0 .将其所有元素初始化为0 Then read your strings and store the frequencies of the characters to these arrays.然后读取您的字符串并将字符的频率存储到这些数组中。 After that compare the arrays freq1 and freq2 to find the common characters.之后比较数组freq1freq2以找到公共字符。

It can be done in O(n) time with constant space.它可以在 O(n) 时间内以恒定空间完成。

The pseudo code goes like this :伪代码是这样的:

int map1[26], map2[26];
int common_chars = 0;

for c1 in string1:
    map1[c1]++;

for c2 in string2:
    map2[c2]++;

for i in 1 to 26:
    common_chars += min(map1[i], map2[i]);

Your current code is O(n^3) because of the O(n) strlen s and produces incorrect results, for example on "aa", "aa" (which your code will return 4).由于 O(n) strlen s,您当前的代码是 O(n^3) 并产生不正确的结果,例如“aa”、“aa”(您的代码将返回 4)。

This code counts letters in common (each letter being counted at most once) in O(n).此代码以 O(n) 计算共同的字母(每个字母最多计数一次)。

int common(const char *a, const char *b) {
    int table[256] = {0};
    int result = 0;
    for (; *a; a++)table[*a]++;
    for (; *b; b++)result += (table[*b]-- > 0);
    return result;
}

Depending on how you define "letters in common", you may have different logic.根据您如何定义“共同字母”,您可能有不同的逻辑。 Here's some testcases for the definition I'm using (which is size of the multiset intersection).这是我正在使用的定义的一些测试用例(即多集交集的大小)。

int main(int argc, char *argv[]) {
    struct { const char *a, *b; int want; } cases[] = {
        {"a", "a", 1},
        {"a", "b", 0},
        {"a", "aa", 1},
        {"aa", "a", 1},
        {"ccc", "cccc", 3},
        {"aaa", "aaa", 3},
        {"abc", "cba", 3},
        {"aasa", "asad", 3},
    };
    int fail = 0;
    for (int i = 0; i < sizeof(cases) / sizeof(*cases); i++) {
        int got = common(cases[i].a, cases[i].b);
        if (got != cases[i].want) {
            fail = 1;
            printf("common(%s, %s) = %d, want %d\n",
                   cases[i].a, cases[i].b, got, cases[i].want);
        }
    }
    return fail;
}

You can do it with 2n:你可以用 2n 来做到:

int i,j, len1 = strlen(s1), len2 = strlen(s2);
unsigned char allChars[256] = { 0 };
int count = 0;

for( i=0; i<len1; i++ )
{
    allChars[ (unsigned char) s1[i] ] = 1;
}

for( i=0; i<len2; i++ )
{
    if( allChars[ (unsigned char) s1[i] ] == 1 )
    {
        allChars[ (unsigned char) s2[i] ] = 2;
    }
}

for( i=0; i<256; i++ )
{
    if( allChars[i] == 2 )
    {
        cout << allChars[i] << endl;
        count++;
    }
}

Following code traverses each sting only once.以下代码仅遍历每个刺一次。 So the complexity is O(n).所以复杂度是O(n)。 One of the assumptions is that the upper and lower cases are considered same.假设之一是大写和小写被认为是相同的。

 #include<stdio.h>

int main() {
    char a[] = "Hello world";
    char b[] = "woowrd";
    int x[26] = {0};
    int i;
    int index;

    for (i = 0; a[i] != '\0'; i++) {
        index = a[i] - 'a';
        if (index > 26) {
            //capital char
            index = a[i] - 'A';
        }
        x[index]++;
    }

    for (i = 0; b[i] != '\0'; i++) {
        index = b[i] - 'a';
        if (index > 26) {
            //capital char
            index = b[i] - 'A';
        }

        if (x[index] > 0)
            x[index] = -1;
    }

    printf("Common characters in '%s' and '%s' are ", a, b);
    for (i = 0; i < 26; i++) {
        if (x[i] < 0)
            printf("%c", 'a'+i);
    }
    printf("\n");
}
int count(string a, string b) 
{

   int i,c[26]={0},c1[26]={};

   for(i=0;i<a.length();i++)
   {
        if(97<=a[i]&&a[i]<=123)
        c[a[i]-97]++;
   }
   for(i=0;i<b.length();i++)
   {
       if(97<=b[i]&&b[i]<=123)
        c1[b[i]-97]++;
    } 
    int s=0;
    for(i=0;i<26;i++)
    {
        s=s+abs(c[i]+c1[i]-(c[i]-c1[i]));

    }   

    return (s);  
}

This is much easier and better solution这是更容易和更好的解决方案

for (std::vector<char>::iterator i = s1.begin(); i != s1.end(); ++i)
{
    if (std::find(s2.begin(), s2.end(), *i) != s2.end())
   {
    dest.push_back(*i);
   }
}

taken from here取自这里

C implementation to run in O(n) time and constant space.在 O(n) 时间和恒定空间中运行的 C 实现。

#define ALPHABETS_COUNT 26 
int commonChars(char *s1, char *s2)
{
    int c_count = 0, i; 
    int arr1[ALPHABETS_COUNT] = {0}, arr2[ALPHABETS_COUNT] = {0};

    /* Compute the number of occurances of each character */
    while (*s1) arr1[*s1++-'a'] += 1;
    while (*s2) arr2[*s2++-'a'] += 1;       

    /* Increment count based on match found */
    for(i=0; i<ALPHABETS_COUNT; i++) {
        if(arr1[i] == arr2[i]) c_count += arr1[i];
        else if(arr1[i]>arr2[i] && arr2[i] != 0) c_count += arr2[i];
        else if(arr2[i]>arr1[i] && arr1[i] != 0) c_count += arr1[i];
    }

    return c_count;

} }

First, your code does not run in O(n^2), it runs in O(nm), where n and m are the length of each string.首先,您的代码不是在 O(n^2) 中运行,而是在 O(nm) 中运行,其中 n 和 m 是每个字符串的长度。

You can do it in O(n+m), but not better, since you have to go through each string, at least once, to see if a character is in both.您可以在 O(n+m) 中完成它,但不是更好,因为您必须遍历每个字符串,至少一次,以查看一个字符是否在两个字符串中。

An example in C++, assuming: C++ 中的一个示例,假设:

  • ASCII characters ASCII 字符
  • All characters included (letters, numbers, special, spaces, etc...)包括所有字符(字母、数字、特殊字符、空格等...)
  • Case sensitive区分大小写

std::vector<char> strIntersect(std::string const&s1, std::string const&s2){

    std::vector<bool> presents(256, false);  //Assuming ASCII
    std::vector<char> intersection;

    for (auto c : s1) {
        presents[c] = true;
    }
    for (auto c : s2) {
        if (presents[c]){
            intersection.push_back(c);
            presents[c] = false;
        }
    }
    return intersection;
}

int main() {
    std::vector<char> result; 
    std::string s1 = "El perro de San Roque no tiene rabo, porque Ramon Rodriguez se lo ha cortado";
    std::string s2 = "Saint Roque's dog has no tail, because Ramon Rodriguez chopped it off";

    //Expected: "S a i n t   R o q u e s d g h l , b c m r z p"

    result = strIntersect(s1, s2);
    for (auto c : result) {
         std::cout << c << " ";
    }
    std::cout << std::endl;

    return 0;
}

Their is a more better version in c++ : C++ bitset and its application A bitset is an array of bool but each Boolean value is not stored separately instead bitset optimizes the space such that each bool takes 1 bit space only, so space taken by bitset bs is less than that of bool bs[N] and vector bs(N).它们是 C++ 中更好的版本:C++ bitset 及其应用 bitset 是一个 bool 数组,但每个布尔值不是单独存储的,而是 bitset 优化了空间,使得每个 bool 只占用 1 位空间,因此 bitset bs 占用了空间小于 bool bs[N] 和向量 bs(N)。 However, a limitation of bitset is, N must be known at compile time, ie, a constant (this limitation is not there with vector and dynamic array)然而,bitset 的一个限制是,N 必须在编译时已知,即一个常量(向量和动态数组没有这个限制)

As bitset stores the same information in compressed manner the operation on bitset are faster than that of array and vector.由于 bitset 以压缩方式存储相同的信息,因此对 bitset 的操作比对数组和向量的操作要快。 We can access each bit of bitset individually with help of array indexing operator [] that is bs[3] shows bit at index 3 of bitset bs just like a simple array.我们可以在数组索引运算符 [] 的帮助下单独访问 bitset 的每一位,即 bs[3] 像一个简单的数组一样显示 bitset bs 的索引 3 处的位。 Remember bitset starts its indexing backward that is for 10110, 0 are at 0th and 3rd indices whereas 1 are at 1st 2nd and 4th indices.请记住,bitset 开始向后索引 10110,0 位于第 0 和第 3 个索引,而 1 位于第 2 个和第 4 个索引。 We can construct a bitset using integer number as well as binary string via constructors which is shown in below code.我们可以通过构造函数使用整数和二进制字符串构造一个位集,如下面的代码所示。 The size of bitset is fixed at compile time that is, it can't be changed at runtime. bitset 的大小在编译时是固定的,即不能在运行时更改。 For more information about bitset visit the site : https://www.geeksforgeeks.org/c-bitset-and-its-application有关 bitset 的更多信息,请访问该站点: https : //www.geeksforgeeks.org/c-bitset-and-its-application

The code is as follows :代码如下:

    // considering the strings to be of lower case.

    int main()
    {
        string s1,s2;
        cin>>s1>>s2;

        //Declaration for bitset type variables
        bitset<26> b_s1,b_s2;

        // setting the bits in b_s1 for the encountered characters of string s1 
        for(auto& i : s1)
        {
          if(!b_s1[i-'a'])
            b_s1[i-'a'] = 1;
        }

        // setting the bits in b_s2 for the encountered characters of string s2 
        for(auto& i : s2)
        {
          if(!b_s2[i-'a'])
            b_s2[i-'a'] = 1;
        }

       // counting the number of set bits by the "Logical AND"  operation
       // between b_s1 and b_s2
       cout<<(b_s1&b_s2).count(); 

     }

No need to initialize and keep an array of 26 elements (numbers for each letter in alphabet).无需初始化和保留 26 个元素的数组(字母表中每个字母的数字)。 Just fo the following:仅针对以下内容:

  1. Using HashMap store letter as a key and integer got the count as a value.使用 HashMap 存储字母作为键,整数将计数作为值。
  2. Create a Set of characters.创建一组字符。
  3. Iterate through each string characters, add to the Set from step 2. If add() method returned false, (means that same character already exists in the Set), then add the character to the map and increment the value.遍历每个字符串字符,添加到第 2 步中的 Set。如果 add() 方法返回 false,(意味着 Set 中已存在相同的字符),则将该字符添加到映射并增加值。

These steps are written considering Java programming language.这些步骤是在考虑 Java 编程语言的情况下编写的。

Python Code:蟒蛇代码:

    >>>s1='abbc'    
    >>>s2='abde'
    >>>p=list(set(s1).intersection(set(s2)))
    >>print(p)
    ['a','b']

Hope this helps you, Happy Coding!希望这对你有帮助,快乐编码!

可以使用“捕获”的概念轻松完成,该概念是散列的子算法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM