简体   繁体   中英

How can I find repeated words in a vector of strings in C++?

I have a std::vector<string> where each element is a word. I want to print the vector without repeated words!

I searched a lot on the web and I found lots of material, but I can't and I don't want to use hash maps, iterators and "advanced" (to me) stuff. I can only use plain string comparison == as I am still a beginner.

So, let my_vec a std::vector<std::string> initialized from std input. My idea was to read all the vector and erase any repeated word once I found it:

  for(int i=0;i<my_vec.size();++i){
    for (int j=i+1;j<my_vec.size();++j){
      if(my_vec[i]==my_vec[j]){
        my_vec.erase(my_vec.begin()+j); //remove the component from the vector
      }
    }
  }

I tried to test for std::vector<std::string> my_vec{"hey","how","are","you","fine","and","you","fine"}

and indeed I found

hey how are you fine and

so it seems to be right, but for instance if I write the simple vector std::vector<std::string> my_vec{"hello","hello","hello","hello","hello"}

I obtain

hello hello

The problem is that at every call to erase the dimension gets smaller and so I lose information. How can I do that?

Minimalist approach to your existing code. The auto-increment of j is what is ultimately breaking your algorithm. Don't do that. Instead, only increment it when you do NOT remove an element.

Ie

for (int i = 0; i < my_vec.size(); ++i) {
    for (int j = i + 1; j < my_vec.size(); ) {  // NOTE: no ++j
        if (my_vec[i] == my_vec[j]) {
            my_vec.erase(my_vec.begin() + j);
        }
        else ++j; // NOTE: moved to else-clause
    }
}

That is literally it.

Why don't you use std::unique ?

You can use it as easy as:

std::vector<std::string> v{ "hello", "hello", "hello", "hello", "hello" };
std::sort(v.begin(), v.end());
v.erase(std::unique(v.begin(), v.end()), v.end()); 

NB Elements need to be sorted because std::unique works only for consecutive duplicates.

In case you don't want to change the content of the std::vector , but only have stable output, I recommend other answers.

You can store the element element index to erase and then eliminate it at the end. Or repeat the cycle until no erase are performed.

First code Example:

std::vector<int> index_to_erase();

for(int i=0;i<my_vec.size();++i){
    for (int j=i+1;j<my_vec.size();++j){
      if(my_vec[i]==my_vec[j]){
        index_to_erase.push_back(j);
        
      }
    }
  }
//starting the cycle from the last element to the vector of index, in this 
//way the vector of element remains equal for the first n elements
for (int i = index_to_erase.size()-1; i >= 0; i--){
   my_vec.erase(my_vec.begin()+index_to_erase[i]); //remove the component from the vector
} 

Second code Example:

bool Erase = true;
while(Erase){
  Erase = false;
  for(int i=0;i<my_vec.size();++i){
    for (int j=i+1;j<my_vec.size();++j){
      if(my_vec[i]==my_vec[j]){
        my_vec.erase(my_vec.begin()+j); //remove the component from the vector
        Erase = true;
      }
    }
  }
}

Erasing elements from a container inside a loop is a little tricky, because after erasing element at index i the next element (in the next iteration) is not at index i+1 but at index i .

Read about the erase-remove-idiom for the idomatic way to erase elements. However, if you just want to print on the screen there is a much simpler way to fix your code:

for(int i=0; i<my_vec.size(); ++i){
   bool unique = true;
   for (int j=0; j<i; ++j){
       if(my_vec[i]==my_vec[j]) {
           unique = false;
           break; 
       }
       if (unique) std::cout << my_vec[i];
   }
}

Instead of checking for elements after the current one you should compare to elements before. Otherwise "bar x bar y bar" will result in "xx bar" when I suppose it should be "bar xy".

Last but not least, consider that using the traditional loops with indices is the complicated way, while using iterators or a range-based loop is much simpler. Don't be afraid of new stuff, on the long run it will be easier to use.

You can simply use the combination of sort and unique as follows.

#include <iostream>
#include <algorithm>
#include <vector>

int main() {
    std::vector<std::string> vec{"hey","how","are","you","fine","and","you","fine"};
    sort(vec.begin(), vec.end());
    vec.erase(unique(vec.begin(), vec.end() ), vec.end());
    
    for (int i = 0; i < vec.size(); i ++) {
        std::cout << vec[i] << " ";
    }
    std::cout << "\n";

    return 0;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM