如何优化std :: set交集算法（C ++）

Question

I'm struggling with a part of my college assignment. 我正在为大学工作的一部分苦苦挣扎。 I have two subsets of std::set containers containing pointers to quite complex objects, but ordered by different criteria (which is why I can't use std::set_intersection() ). 我有两个std :: set容器子集，其中包含指向相当复杂的对象的指针，但是按不同的标准排序（这就是为什么我不能使用std::set_intersection() ）。 I need to find elements that are contained in both subsets as fast as possible. 我需要尽快找到两个子集中包含的元素。 There is a time/complexity requirement on the assignment. 分配有时间/复杂性要求。

I can do that in n*log(m) time where n is the size of the first subset and m is the size of the second subset by doing the following: 通过执行以下操作，我可以在n*log(m)时间内完成操作，其中n是第一个子集的大小， m是第二个子集的大小：

for(auto it = subset1.begin(), it != subset1.end(), it++){
    if(find(subset2.begin(), subset2.end(), *it))
        result.insert(*it);
}

This fails the time requirement, which says worst case linear, but average better than linear. 这没有满足时间要求，时间要求说最坏的情况是线性的，但平均水平好于线性。

I found the following question here and I find the hashtable approach interesting. 我在这里发现以下问题，并且发现哈希表方法很有趣。 However, I fear that the creation of the hashtable might incur too much overhead. 但是，我担心散列表的创建可能会导致过多的开销。 The class contained in the sets looks something like this: 集合中包含的类如下所示：

class containedInSets {
   //methods
private: 
    vector<string> member1;
    SomeObject member2;
    int member3;
}

I have no control over the SomeObject class, and therefore cannot specify a hash function for it. 我无法控制SomeObject类，因此无法为其指定哈希函数。 I'd have to hash the pointer. 我必须对指针进行哈希处理。 Furthermore, the vector may grow quite (in the thousands of entries). 此外，向量可能会相当大地增长（成千上万个条目）。

What is the quickest way of doing this? 最快的方法是什么？

Answer 1

Your code is not O(n log(m)) but O(n * m) . 您的代码不是O(n log(m))而是O(n * m) 。

std::find(subset2.begin(), subset2.end(), *it) is linear, but std::set has methods find and count which are in O(log(n)) (they do a binary search). std::find(subset2.begin(), subset2.end(), *it)是线性的，但是std::set具有在O(log(n)) find和count方法（它们进行二进制搜索）。

So you can simply do: 因此，您可以简单地执行以下操作：

for (const auto& e : subset1) {
    if (subset2.count(e) != 0) {
        result.insert(e);
    }
}

Which has complexity of n*log(m) instead of your n * m . 它的复杂度为n*log(m)而不是您的n * m 。

如何优化std :: set交集算法（C ++）

问题描述

1 个解决方案

解决方案1
3 2018-04-22 11:05:07

如何优化std :: set交集算法（C ++）

问题描述

1 个解决方案

解决方案1 3 2018-04-22 11:05:07

解决方案1
3 2018-04-22 11:05:07