C ++比較2個字符串列表

Question

在Python中，set非常方便用於比較2個字符串列表（請參閱此鏈接）。 我想知道在性能方面是否有一個很好的C ++解決方案。 因為每個列表中有超過100萬個字符串。

這是區分大小寫的匹配。

Answer 1

數據類型std::set<> （通常實現為平衡樹）和std::unordered_set<> （來自C ++ 11，實現為哈希）可用。 還有一種稱為std::set_intersection的便捷算法，用於計算實際的交叉點。

這是一個例子。

#include <iostream>
#include <vector>
#include <string>
#include <set>             // for std::set
#include <algorithm>       // for std::set_intersection

int main()
{
  std::set<std::string> s1 { "red", "green", "blue" };
  std::set<std::string> s2 { "black", "blue", "white", "green" };

  /* Collecting the results in a vector. The vector may grow quite
     large -- it may be more efficient to print the elements directly. */     
  std::vector<std::string> s_both {};

  std::set_intersection(s1.begin(),s1.end(),
                        s2.begin(),s2.end(),
                        std::back_inserter(s_both));

  /* Printing the elements collected by the vector, just to show that
     the result is correct. */
  for (const std::string &s : s_both)
    std::cout << s << ' ';
  std::cout << std::endl;

  return 0;
}

注意。 如果要使用std::unordered_set<> ，則不能像這樣使用std::set_intersection ，因為它需要對輸入集進行排序。 你必須使用通常的for循環迭代技術迭代較小的集合並找到較大集合中的元素來確定交集。 然而，對於大量元素（尤其是字符串），基於散列的std::unordered_set<>可能更快。 還有與STL兼容的實現，例如Boost（ boost::unordered_set ）中的實現和Google創建的實現（ sparse_hash_set和dense_hash_set ）。 對於各種其他實現和基准（包括一個用於字符串），請參見此處。

Answer 2

如果你不需要太多性能我建議使用STL的map / set：

list<string> list, list2;
...
set<string> sndList;
list<string> result;

for(list<string>::iterator it = list2.begin(); it != list2.end(); ++it)
   sndList.insert(*it);

for(list<string>::iteratir it = list.begin(); it != list.end(); ++it)
    if(sndList.count(*it) > 0)
        result.push_back(*it);

否則我建議使用一些散列函數進行比較。

Answer 3

如果它確實是一個std::list ，請對它們進行排序並使用set_intersection ：

list<string> words1;
list<string> words2;
list<string> common_words;

words1.sort();
words2.sort();

set_intersection(words1.begin(), words1.end(),
                 words2.begin(), words2.end(),
                 back_inserter(common_words));

C ++比較2個字符串列表

問題描述

3 個解決方案

解決方案1
9 已采納 2012-09-12 07:09:24

解決方案2
0 2012-09-12 07:07:11

解決方案3
0 2012-09-12 13:42:28

C ++比較2個字符串列表

問題描述

3 個解決方案

解決方案1 9 已采納 2012-09-12 07:09:24

解決方案2 0 2012-09-12 07:07:11

解決方案3 0 2012-09-12 13:42:28

解決方案1
9 已采納 2012-09-12 07:09:24

解決方案2
0 2012-09-12 07:07:11

解決方案3
0 2012-09-12 13:42:28