简体   繁体   中英

Longest substring with only 2 distinct chars in C++

I am trying to find the longest substring with at most 2 distinct characters. It is a brute force program which just uses all possible substrings and checks if they have 2 or more distinct chars.

I use a set to keep track of the distinct chars.

#include <iostream>
#include <string>
#include <algorithm>
#include <unordered_set>

using namespace std;

int main()
{
   string s = "AllPossibleSubstrings";
   int l=0,index=0;
   for(int i =0;i<s.length();i++)
   {
       for(int j=i+1;j<s.length();j++)
       {
           string sub = string(s.begin()+i,s.begin()+j);
           unordered_set<char> v;
           for(auto x:sub)
           {
               v.insert(x);
           }
           if(v.size()<=2) {l=max(l,j-i+1); if(l==j-i+1) index=i;}
       }
   }

   cout<<l<<" "+s.substr(index,l)<<endl;

}

I get the wrong answer of 4 ssib , while the correct answer must not have b (All, llP, oss, ssi are possible answers). Where am I doing wrong ?

If you add debug output to your code to see which strings does it find:

if(v.size()<=2) {
    l=max(l,j-i+1); 
    if(l==j-i+1) {
        index=i;
        cout << "Found match " << i << " " << j << " " << l << " " << sub << endl;
    }
}

you'll see that it finds the proper strings:

Found match 0 1 2 A
Found match 0 2 3 Al
Found match 0 3 4 All
Found match 1 4 4 llP
Found match 4 7 4 oss
Found match 5 8 4 ssi

(see here: http://ideone.com/lQqgnq )

But you will also see that, for example, for i=5 and j=8 you get sub="ssi" , but l=4 , which is clearly wrong.

So the reason for wrong behavior is that string(s.begin()+i,s.begin()+j) makes the substring starting from i -th character and upto, but not including , the j -th character: http://www.cplusplus.com/reference/string/string/string/ :

 template <class InputIterator> string (InputIterator first, InputIterator last); 

Copies the sequence of characters in the range [first,last), in the same order.

Note that last is not included.

So your l should be calculated correspondingly: as ji , not j-i+1 .


In fact, the reason is that your code is overcomplicated. You clearly use s.substr at the end of your code, why do not you use the same in the main loop? You could even have looped over i and l , and then you would not have such problems.

Moreover, in fact you do not need to extract a substring each time. You can loop over i and l and just keep currect set of different chars. This will yield a faster O(N^2) solution, while yours is O(N^3) . Something like:

for (int i=0; i<s.length(); i++) {
   unordered_set<char> v; 
   for (int l=1; l<s.length()-i; l++) 
       v.insert(s[i+l-1]);
       if (v.size()>2) break;
       if (l>maxl) {
            index = i;
            maxl = l;
       }
}

In fact, even an O(N) solution can also be achieved here, but with a bit more advanced code.

The problem is that the l variable is the length of the substring + 1... Notice that the j index is one past the substring's last character.

So, to get it right:

change the if statement to:

   if(v.size()<=2) {l=max(l,j-i); if(l==j-i) index=i;}

I modified your code, if I got it right the answer is any of the found, if you wish you can store them in an array or something and display all of the same size (the same being the longest). This is as much brute force you can get.

#include <iostream>
#include <string>
#include <algorithm>
#include <unordered_set>

using namespace std;

int main(int args, char argv[]){
    string s = "AllPossibleSubstrings";
    string output = string();

    int starts_from = 0, length = 0;

    for (int i = 0; i < s.length(); i++){
            string sub = string();

            sub += s[i];

            int characters = 1;
            bool not_found = false;

            for (int j = i + 1; j < s.length() && characters <= 2; j++){
                for (int k = 0; k < sub.length(); k++)
                    if (s[j] != sub[k])
                        not_found = true;

                    if (not_found && characters == 1){
                        sub += s[j];
                        characters++;
                    }
                    else if (not_found && characters == 2)
                        break;
                    else
                        sub += s[j];
            }


            if (sub.length() > length){
                length = sub.length();
                starts_from = i; // index + 1 for the >1 value
                output = sub;
            }
    }

    cout << endl << "''" << output << "''" << " which starts from index " << starts_from << " and is " << output.length() << " characters long.." << endl;

    system("pause");

    return 0;
}

It would be easier to use char* pointer then String class which is more Java-like approach. Then you use the same algorithm structure with nested loop and count the substring length (from first to the next capital letter ) and if it's longer than any other substring make a new allocation with char* or create a new String object if you must use String class. Of course, you start with the value of 0 for the longest:

unsigned int longest_substring = 0;

If greater value is found, as I already said, you change it to it's length and re-allocate the output string (char[]/char*) variable.

For all this to work you'll need loop counters, longest_string, current_string (for the length of the substring being checked in the nested loop) and of course char*/String to store the up-to this point longest sub-String.

I'm in a hurry so I can't provide the code but that's the logic :-)

The algorithm is fine (of course, it is brute-force, nothing fancy). However, the substring generated in the inner loop is misunderstood.

The substring is from index i to index j-1 (both included), it does not include index j. So, the length of the substring must be (j-1 - i +1) = j-1 . index and length variables must be updated using this correct length.

This was the source for error. By the way among the possible answers the algorithm returns the last substring according to position in the string.

Maybe better rewrite your code this:

#include <iostream>
#include <string>

using namespace std;

int main()
{
   string s = "AllPossibleSubstrings";
   int max=0,index=0;
   for(int i =0;i<s.length();i++)
   {
       int j=i;
       for(;s[j]==s[i];j++);
       if((j-i+1)>max)
       {
       max = (j-i+1);
       index = i;
       }
   }
   cout<<max<<" "+s.substr(index,max)<<endl;
}

Edit Also should add one more check.The result is a body of the loop will become so:

   int j=i+1;
   for(;(s[j]==s[i] && j<n);j++);
   if(j==n) break;
   int z=j+1;
   for(;(s[z]==s[i]||s[z]==s[j]);z++);
   if((z-i)>max)
   {   
   max = (z-i);
   index = i;
   }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM