In Python, set is pretty handy for comparing 2 lists of strings (see this link ). I was wondering if there's a good solution for C++ in terms of performance. As each list has over 1 million strings in it.
It's case-sensitive matching.
The data types std::set<>
(usually implemented as a balanced tree) and std::unordered_set<>
(from C++11, implemented as a hash) are available. There is also a convenience algorithm called std::set_intersection
that computes the actual intersection.
Here is an example.
#include <iostream>
#include <vector>
#include <string>
#include <set> // for std::set
#include <algorithm> // for std::set_intersection
int main()
{
std::set<std::string> s1 { "red", "green", "blue" };
std::set<std::string> s2 { "black", "blue", "white", "green" };
/* Collecting the results in a vector. The vector may grow quite
large -- it may be more efficient to print the elements directly. */
std::vector<std::string> s_both {};
std::set_intersection(s1.begin(),s1.end(),
s2.begin(),s2.end(),
std::back_inserter(s_both));
/* Printing the elements collected by the vector, just to show that
the result is correct. */
for (const std::string &s : s_both)
std::cout << s << ' ';
std::cout << std::endl;
return 0;
}
Note. If you want to use std::unordered_set<>
, the std::set_intersection
cannot be used like this, because it expects the input sets to be ordered. You'd have to use the usual technique of a for-loop iterating over the smaller set and finding the elements in the larger one to determine the intersection. Nevertheless, for a large number of elements (especially, strings), the hash-based std::unordered_set<>
may be faster. There are also STL-compatible implementations such as the one in Boost ( boost::unordered_set
) and the one created by Google ( sparse_hash_set
and dense_hash_set
). For various other implementations and benchmarks (including one for strings), see here .
If you don't need much performance I suggest using map/set from STL:
list<string> list, list2;
...
set<string> sndList;
list<string> result;
for(list<string>::iterator it = list2.begin(); it != list2.end(); ++it)
sndList.insert(*it);
for(list<string>::iteratir it = list.begin(); it != list.end(); ++it)
if(sndList.count(*it) > 0)
result.push_back(*it);
Otherwise I suggest some hashing function for comparison.
If it really is a std::list
you have, sort them and use set_intersection
:
list<string> words1;
list<string> words2;
list<string> common_words;
words1.sort();
words2.sort();
set_intersection(words1.begin(), words1.end(),
words2.begin(), words2.end(),
back_inserter(common_words));
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.