简体   繁体   English

C ++中集合的有效集合交集

[英]Efficient set intersection of a collection of sets in C++

I have a collection of std::set . 我有一个std::set I want to find the intersection of all the sets in this collection, in the fastest manner. 我想以最快的方式找到该集合中所有集合的交集。 The number of sets in the collection is typically very small (~5-10), and the number of elements in each set is is usually less than 1000, but can occasionally go upto around 10000. But I need to do these intersections tens of thousands of time, as fast as possible. 集合中的集合数量通常很小(〜5-10),每个集合中的元素数量通常少于1000,但偶尔可以增加到10000左右。但是我需要做这些交集成千上万的时间,尽快。 I tried to benchmark a few methods as follows: 我尝试对几种方法进行基准测试,如下所示:

  1. In-place intersection in a std::set object which initially copies the first set. std::set对象中的就地交集,该对象最初复制第一组。 Then for subsequent sets, it iterates over all element of itself and the ith set of the collection, and removes items from itself as needed. 然后,对于后续集合,它会迭代其自身的所有元素以及集合的第i个集合,并根据需要从自身中删除项目。
  2. Using std::set_intersection into a temporary std::set , swap contents to a current set, then again find intersection of the current set with the next set and insert into the temp set, and so on. 使用std::set_intersection到临时std::set ,将内容交换到当前集合,然后再次找到当前集合与下一个集合的交集,并插入到临时集合中,依此类推。
  3. Manually iterate over all the elements of all sets like in 1), but using a vector as the destination container instead of std::set . 像1)中一样手动遍历所有集合的所有元素,但是使用vector代替std::set作为目标容器。
  4. Same as in 4, but using a std::list instead of a vector , suspecting a list will provide faster deletions from the middle. 与4中相同,但是使用std::list而不是vector ,怀疑list会从中间提供更快的删除速度。
  5. Using hash sets ( std::unordered_set ) and checking for all items in all sets. 使用哈希集( std::unordered_set )并检查所有集中的所有项目。

As it turned out, using a vector is marginally faster when the number of elements in each set is small, and list is marginally faster for larger sets. 事实证明,当每个集合中的元素数量较小时,使用vector的速度略快,而对于更大的集合,使用list的速度略快。 In-place using set is a substantially slower than both, followed by set_intersection and hash sets. 就地使用set要比两者都慢得多,其次是set_intersection和哈希集。 Is there a faster algorithm/datastructure/tricks to achieve this? 是否有更快的算法/数据结构/技巧来实现这一目标? I can post code snippets if required. 如果需要,我可以发布代码段。 Thanks! 谢谢!

You might want to try a generalization of std::set_intersection() : the algorithm is to use iterators for all sets: 您可能想尝试std::set_intersection()的概括:算法是对所有集合使用迭代器:

  1. If any iterator has reached the end() of its corresponding set, you are done. 如果有任何迭代器到达其对应集合的end() ,则操作完成。 Thus, it can be assumed that all iterators are valid. 因此,可以假定所有迭代器都是有效的。
  2. Take the first iterator's value as the next candidate value x . 将第一个迭代器的值作为下一个候选值x
  3. Move through the list of iterators and std::find_if() the first element at least as big as x . 在迭代器列表中移动,并在第一个元素std::find_if()中移动至少与x一样大的元素。
  4. If the value is bigger than x make it the new candidate value and search again in the sequence of iterators. 如果该值大于x则将其设为新的候选值,然后按迭代器顺序再次搜索。
  5. If all iterators are on value x you found an element of the intersection: Record it, increment all iterators, start over. 如果所有迭代器都在值x您找到了交集的元素:记录该交集,增加所有迭代器,重新开始。

Night is a good adviser and I think I may have an idea ;) 晚上是个好顾问,我想我可能有个主意;)

  • Memory is much slower than CPU these days, if all data fits in the L1 cache no big deal, but it easily spills over to L2 or L3: 5 sets of 1000 elements is already 5000 elements, meaning 5000 nodes, and a set node contains at least 3 pointers + the object (ie, at least 16 bytes on a 32 bits machine and 32 bytes on a 64 bits machine) => that's at least 80k memory and the recent CPUs only have 32k for the L1D so we are already spilling into L2 如今,内存要比CPU慢得多,如果所有数据都适合L1缓存,但它很容易溢出到L2或L3:5组1000个元素已经是5000个元素,这意味着5000个节点,并且一个集合节点包含至少3个指针+对象(即32位计算机上至少16个字节,而64位计算机上至少32个字节)=>至少有80k的内存,而最近的CPU对于L1D来说只有32k,所以我们已经在溢出进入L2
  • The previous fact is compounded by the problem that sets nodes are probably scattered around memory, and not tightly packed together, meaning that part of the cache line is filled with completely unrelated stuff. 先前的事实因以下问题而变得更加复杂:设置节点可能分散在内存周围,并且没有紧密包装在一起,这意味着高速缓存行的一部分充满了完全不相关的内容。 This could be alleviated by provided an allocator that keeps nodes close to each others. 可以通过提供一个使节点相互靠近的分配器来缓解这种情况。
  • And this is further compounded by the fact that CPUs are much better at sequential reads (where they can prefetch memory before you need it, so you don't wait for it) rather than random reads (and a tree structure unfortunately leads to quite random reads) 而且,事实是,CPU在顺序读取方面要好得多(它们可以在需要之前预取内存,因此您不必等待它)比随机读取要好得多(不幸的是,树形结构会导致随机读取)阅读)

This is why where speeds matter, a vector (or perhaps a deque ) are so great structures: they play very well with memory. 这就是为什么速度很重要的原因, vector (或deque )是如此出色的结构:它们在内存中发挥得很好。 As such, I would definitely recommend using vector as our intermediary structures; 因此,我绝对建议使用vector作为我们的中介结构; although care need be taken to only ever insert/delete from an extremity to avoid relocation. 尽管只需要小心地从四肢插入/删除四肢,以避免重新定位。

So I thought about a rather simple approach: 所以我想到了一个相当简单的方法:

#include <cassert>

#include <algorithm>
#include <set>
#include <vector>

// Do not call this method if you have a single set...
// And the pointers better not be null either!
std::vector<int> intersect(std::vector< std::set<int> const* > const& sets) {
    for (auto s: sets) { assert(s && "I said no null pointer"); }

    std::vector<int> result; // only return this one, for NRVO to kick in

    // 0. Check obvious cases
    if (sets.empty()) { return result; }

    if (sets.size() == 1) {
        result.assign(sets.front()->begin(), sets.front()->end());
        return result;
    }


    // 1. Merge first two sets in the result
    std::set_intersection(sets[0]->begin(), sets[0]->end(),
                          sets[1]->begin(), sets[1]->end(),
                          std::back_inserter(result));

    if (sets.size() == 2) { return result; }


    // 2. Merge consecutive sets with result into buffer, then swap them around
    //    so that the "result" is always in result at the end of the loop.

    std::vector<int> buffer; // outside the loop so that we reuse its memory

    for (size_t i = 2; i < sets.size(); ++i) {
        buffer.clear();

        std::set_intersection(result.begin(), result.end(),
                              sets[i]->begin(), sets[i]->end(),
                              std::back_inserter(buffer));

        swap(result, buffer);
    }

    return result;
}

It seems correct , I cannot guarantee its speed though, obviously. 看来是正确的 ,但是显然我不能保证它的速度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM