简体   繁体   English

C ++中只有2个不同字符的最长子字符串

[英]Longest substring with only 2 distinct chars in C++

I am trying to find the longest substring with at most 2 distinct characters. 我试图找到最多包含2个不同字符的最长子字符串。 It is a brute force program which just uses all possible substrings and checks if they have 2 or more distinct chars. 这是一个蛮力程序,它仅使用所有可能的子字符串并检查它们是否具有2个或更多不同的字符。

I use a set to keep track of the distinct chars. 我使用一组来跟踪不同的字符。

#include <iostream>
#include <string>
#include <algorithm>
#include <unordered_set>

using namespace std;

int main()
{
   string s = "AllPossibleSubstrings";
   int l=0,index=0;
   for(int i =0;i<s.length();i++)
   {
       for(int j=i+1;j<s.length();j++)
       {
           string sub = string(s.begin()+i,s.begin()+j);
           unordered_set<char> v;
           for(auto x:sub)
           {
               v.insert(x);
           }
           if(v.size()<=2) {l=max(l,j-i+1); if(l==j-i+1) index=i;}
       }
   }

   cout<<l<<" "+s.substr(index,l)<<endl;

}

I get the wrong answer of 4 ssib , while the correct answer must not have b (All, llP, oss, ssi are possible answers). 我得到了4 ssib的错误答案,而正确的答案一定不能为b(可能的答案是All,llP,oss,ssi)。 Where am I doing wrong ? 我在哪里做错了?

If you add debug output to your code to see which strings does it find: 如果将调试输出添加到代码中以查看查找到的字符串:

if(v.size()<=2) {
    l=max(l,j-i+1); 
    if(l==j-i+1) {
        index=i;
        cout << "Found match " << i << " " << j << " " << l << " " << sub << endl;
    }
}

you'll see that it finds the proper strings: 您会看到它找到了正确的字符串:

Found match 0 1 2 A
Found match 0 2 3 Al
Found match 0 3 4 All
Found match 1 4 4 llP
Found match 4 7 4 oss
Found match 5 8 4 ssi

(see here: http://ideone.com/lQqgnq ) (请参阅此处: http : //ideone.com/lQqgnq

But you will also see that, for example, for i=5 and j=8 you get sub="ssi" , but l=4 , which is clearly wrong. 但是您还将看到,例如,对于i=5j=8您得到sub="ssi" ,但是l=4 ,这显然是错误的。

So the reason for wrong behavior is that string(s.begin()+i,s.begin()+j) makes the substring starting from i -th character and upto, but not including , the j -th character: http://www.cplusplus.com/reference/string/string/string/ : 因此,错误行为的原因是string(s.begin()+i,s.begin()+j)使子字符串从第i个字符开始,直到但不包括j个字符: http: //www.cplusplus.com/reference/string/string/string/

 template <class InputIterator> string (InputIterator first, InputIterator last); 

Copies the sequence of characters in the range [first,last), in the same order. 以相同顺序复制[first,last)范围内的字符序列。

Note that last is not included. 请注意,不包括last

So your l should be calculated correspondingly: as ji , not j-i+1 . 因此,您的l应该相应地进行计算:作为ji ,而不是j-i+1


In fact, the reason is that your code is overcomplicated. 实际上,原因是您的代码过于复杂。 You clearly use s.substr at the end of your code, why do not you use the same in the main loop? 您在代码末尾清楚地使用了s.substr ,为什么不在主循环中使用它呢? You could even have looped over i and l , and then you would not have such problems. 您甚至可能已经遍历了il ,那么您就不会遇到此类问题。

Moreover, in fact you do not need to extract a substring each time. 而且,实际上,您不必每次都提取一个子字符串。 You can loop over i and l and just keep currect set of different chars. 您可以遍历il并仅保留当前不同字符的集合。 This will yield a faster O(N^2) solution, while yours is O(N^3) . 这将产生更快的O(N^2)解,而您的是O(N^3) Something like: 就像是:

for (int i=0; i<s.length(); i++) {
   unordered_set<char> v; 
   for (int l=1; l<s.length()-i; l++) 
       v.insert(s[i+l-1]);
       if (v.size()>2) break;
       if (l>maxl) {
            index = i;
            maxl = l;
       }
}

In fact, even an O(N) solution can also be achieved here, but with a bit more advanced code. 实际上,即使使用O(N)解决方案,也可以在这里实现,但要使用一些更高级的代码。

The problem is that the l variable is the length of the substring + 1... Notice that the j index is one past the substring's last character. 问题在于l变量是子字符串的长度+ 1 ...请注意, j索引在子字符串的最后一个字符之后。

So, to get it right: 因此,要正确处理:

change the if statement to: 将if语句更改为:

   if(v.size()<=2) {l=max(l,j-i); if(l==j-i) index=i;}

I modified your code, if I got it right the answer is any of the found, if you wish you can store them in an array or something and display all of the same size (the same being the longest). 我修改了您的代码,如果正确,答案是找到的任何答案,如果您希望将它们存储在数组或其他内容中,并显示所有相同大小(最长的相同)。 This is as much brute force you can get. 这是您所能获得的强大力量。

#include <iostream>
#include <string>
#include <algorithm>
#include <unordered_set>

using namespace std;

int main(int args, char argv[]){
    string s = "AllPossibleSubstrings";
    string output = string();

    int starts_from = 0, length = 0;

    for (int i = 0; i < s.length(); i++){
            string sub = string();

            sub += s[i];

            int characters = 1;
            bool not_found = false;

            for (int j = i + 1; j < s.length() && characters <= 2; j++){
                for (int k = 0; k < sub.length(); k++)
                    if (s[j] != sub[k])
                        not_found = true;

                    if (not_found && characters == 1){
                        sub += s[j];
                        characters++;
                    }
                    else if (not_found && characters == 2)
                        break;
                    else
                        sub += s[j];
            }


            if (sub.length() > length){
                length = sub.length();
                starts_from = i; // index + 1 for the >1 value
                output = sub;
            }
    }

    cout << endl << "''" << output << "''" << " which starts from index " << starts_from << " and is " << output.length() << " characters long.." << endl;

    system("pause");

    return 0;
}

It would be easier to use char* pointer then String class which is more Java-like approach. 使用char *指针比使用String类要容易得多,后者更像Java。 Then you use the same algorithm structure with nested loop and count the substring length (from first to the next capital letter ) and if it's longer than any other substring make a new allocation with char* or create a new String object if you must use String class. 然后,对嵌套循环使用相同的算法结构,并计算子字符串的长度(从第一个到下一个大写字母 ),如果它长于任何其他子字符串,请使用char *进行新分配,或者如果必须使用String则创建一个新的String对象类。 Of course, you start with the value of 0 for the longest: 当然,您可以从最长的0开始:

unsigned int longest_substring = 0;

If greater value is found, as I already said, you change it to it's length and re-allocate the output string (char[]/char*) variable. 如前所述,如果找到更大的值,则将其更改为它的长度,然后重新分配输出字符串(char [] / char *)变量。

For all this to work you'll need loop counters, longest_string, current_string (for the length of the substring being checked in the nested loop) and of course char*/String to store the up-to this point longest sub-String. 为了使所有这些工作正常进行,您将需要循环计数器,longest_string,current_string(用于在嵌套循环中检查子字符串的长度),当然还需要char * / String来存储到目前为止的最长子字符串。

I'm in a hurry so I can't provide the code but that's the logic :-) 我很着急,所以我不能提供代码,但这就是逻辑:-)

The algorithm is fine (of course, it is brute-force, nothing fancy). 该算法很好(当然,它是蛮力的,没什么花哨的)。 However, the substring generated in the inner loop is misunderstood. 但是,内部循环中生成的子字符串被误解了。

The substring is from index i to index j-1 (both included), it does not include index j. 子字符串从索引i到索引j-1(均包括在内),不包括索引j。 So, the length of the substring must be (j-1 - i +1) = j-1 . 因此,子字符串的长度必须为(j-1-i +1)= j-1。 index and length variables must be updated using this correct length. 必须使用正确的长度来更新索引和长度变量。

This was the source for error. 这是错误的来源。 By the way among the possible answers the algorithm returns the last substring according to position in the string. 顺便提一下,算法会根据字符串中的位置返回最后一个子字符串。

Maybe better rewrite your code this: 也许最好这样重写代码:

#include <iostream>
#include <string>

using namespace std;

int main()
{
   string s = "AllPossibleSubstrings";
   int max=0,index=0;
   for(int i =0;i<s.length();i++)
   {
       int j=i;
       for(;s[j]==s[i];j++);
       if((j-i+1)>max)
       {
       max = (j-i+1);
       index = i;
       }
   }
   cout<<max<<" "+s.substr(index,max)<<endl;
}

Edit Also should add one more check.The result is a body of the loop will become so: 编辑还应该再添加一个检查。结果是循环的主体将变为:

   int j=i+1;
   for(;(s[j]==s[i] && j<n);j++);
   if(j==n) break;
   int z=j+1;
   for(;(s[z]==s[i]||s[z]==s[j]);z++);
   if((z-i)>max)
   {   
   max = (z-i);
   index = i;
   }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM