如何在std :: vector中查找重復項 <strings> 並返回其中的std :: list按字母順序排序，且結果列表中沒有重復項

Question

我有一個名為Wordd的類，它有一個成員word_，它是一個std :: list

我正在嘗試在那個word_中找到重復項，並返回按字母順序排列的列表，但在該返回列表中沒有重復項。 到目前為止，我的代碼編譯和鏈接，但超時，可能是由於一些內部內存泄漏等。

class FindDuplicatesFunctor
{
public:
    std::list<std::string> list;
    std::vector<std::string> word_;
    FindDuplicatesFunctor(std::vector<std::string> words): list(0), word_(words){};
    void operator()(std::string const& str)
    {

        if(std::count(words_.begin(), words_.end(), str) > 1 && std::count(list.begin(), list.end(), str) == 0)
        {
            list.push_back(str);
        }
        list.sort();

    }
};
std::list<string> Wordd::FindDuplicates() const
{
    FindDuplicatesFunctor cf(word_);
    return std::for_each(words_.begin(), words_.end(), cf).list;
}

任何想法為什么它沒有執行其任務？

預先感謝您的幫助！

Answer 1

編輯回復評論：

^{刪除重復項功能名稱具有誤導性，它實際上是在嘗試返回序列中重復的單詞列表，但該結果列表只有每個副本的一個副本 - user2624236 10小時前}

我暗示了std::sort + std::adjacent_find(... std::equal_to<>) 。 這是實現：

template <typename C, typename T = typename C::value_type> std::list<T> adjacent_search(C input)
{
    std::sort(begin(input), end(input));

    static const auto eq = std::equal_to<T>{};
    static const auto neq= std::not2(eq);

    std::list<T> dupes;

    auto end_streak = begin(input);
    auto dupe_at    = std::adjacent_find(end_streak, end(input), eq);

    for(auto end_streak=begin(input);
        (dupe_at = std::adjacent_find(end_streak, end(input), eq)) != end(input);
        end_streak = std::adjacent_find(dupe_at, end(input), neq))
    {
        dupes.insert(dupes.end(), *dupe_at);
    }

    return dupes;
}

此實現具有幾個很好的屬性，例如線性掃描和合理的最壞情況行為（例如，如果輸入包含單個值的1000個重復，則不會執行1001次無用搜索）。

但是，以下（使用集合）可能更簡單：

// simple, but horrific performance
template <typename C, typename T = typename C::value_type> std::list<T> simple(C const& input)
{
    std::set<T> dupes; // optimization, dupes.find(x) in O(log n)
    for (auto it = begin(input); it != end(input); ++it)
    {
        if ((end(dupes) == dupes.find(*it))) // optimize by reducing find() calls
         && (std::count(it, end(input), *it) > 1))
        {
            dupes.insert(dupes.end(), *it);
        }
    }

    return {begin(dupes), end(dupes)};
}

幾乎可以肯定，這在較小的館藏上會更好地執行，因為復制較少（結果除外）。 由於std::count中的隱式線性搜索，它可能會得到相當糟糕的最壞情況行為（對於大輸入）。

我建議你直接返回std::set<T> ，而不是將它復制到列表中。

這是一個在Coliru上運行Live的測試，顯示了兩個版本。

原始答案

現在已經過時了，因為它不符合OP的要求：

#include <vector>
#include <iostream>
#include <algorithm>
#include <iterator>

int main()
{
    std::vector<std::string> input = { "unsorted", "containing", "optional", "unsorted", "duplicate", "duplicate", "values" };

    std::sort(begin(input), end(input));

    std::unique_copy(begin(input), end(input), std::ostream_iterator<std::string>(std::cout, " "));

    std::cout << "\n";
}

輸出：

containing duplicate optional unsorted values

現場觀看： http ： //coliru.stacked-crooked.com/view？id = f8cc78dbcce62ad276691b6541629a70-542192d2d8aca3c820c7acc656fa0c68

Answer 2

FindDuplicates()函數引用word_和words_ 。 看來，這兩個名稱應該是相同的，它應該是哪一個，不能從代碼片段中確定。

然而，使用的算法非常慢：它需要O(n * n)時間，可能使用許多列表操作，這些操作甚至比向量操作慢。 你肯定希望使用一種方法，就像sehe發布的那樣（ std::sort()后跟std::unique_copy() ）。 如果您的值集非常大，您可能需要考慮僅移動到該集合並保留std::set<std::string> （或std::unordered_set<std::string> ）或aa版本使用std::string const*來確定是否已經看到該值。

Answer 3

排序唯一擦除：

template<typename Container>
Container&& sort_unique_erase( Container&& c ) {
  using std::begin; using std::end;
  std::sort( begin(c), end(c) );
  c.erase( std::unique( begin(c), end(c) ), end(c) );
  return std::forward<Container>(c);
}

適用於任何您可以erase范圍的隨機訪問容器（ namespace std vector和deque ）。

然后追加：

template<typename C1, typename C2>
C1&& append( C1&& c1, C2&& c2 ) {
  using std::begin; using std::end;
  c1.insert( end(c1), std::make_move_iterator( begin(c2) ), std::make_move_iterator( end(c2) ) );
  return std::forward<C1>(c1);
}
template<typename C1, typename C2>
C1&& append( C1&& c1, C2& c2 ) {
  using std::begin; using std::end;
  c1.insert( end(c1), begin(c2), end(c2) );
  return std::forward<C1>(c1);
}

並將它們綁在一起：

int main() {
  std::vector<std::string> words = {"hello", "world", "my", "name", "is", "hello"};
  std::list<std::string> retval;
  append( retval, sort_unique_erase( std::move(words) ) );
  for( auto& str : retval ) {
    std::cout << str << "\n";
  }
}

但是，不建議使用std::list ：在std::vector上使用它的原因很少，或者在極少數情況下使用std::deque 。

如何在std :: vector中查找重復項 <strings> 並返回其中的std :: list按字母順序排序，且結果列表中沒有重復項

問題描述

3 個解決方案

解決方案1
5 2013-07-26 21:16:50

原始答案

解決方案2
1 2013-07-26 21:27:51

解決方案3
1 已采納 2013-07-26 22:05:25

如何在std :: vector中查找重復項 <strings> 並返回其中的std :: list按字母順序排序，且結果列表中沒有重復項

問題描述

3 個解決方案

解決方案1 5 2013-07-26 21:16:50

原始答案

解決方案2 1 2013-07-26 21:27:51

解決方案3 1 已采納 2013-07-26 22:05:25

解決方案1
5 2013-07-26 21:16:50

解決方案2
1 2013-07-26 21:27:51

解決方案3
1 已采納 2013-07-26 22:05:25