简体   繁体   English

在C ++中快速建立多个向量的并集

[英]Fast union building of multiple vectors in C++

I'm searching for a fast way to build a union of multiple vectors in C++. 我正在寻找在C ++中构建多个向量的并集的快速方法。

More specifically: I have a collection of vectors (usually 15-20 vector s with several thousand unsigned integers; always sorted and unique so they could also be an std::set ). 更具体地说:我有一个向量集合(通常是15-20个vector其中有数千个无符号整数;总是经过排序和唯一,因此它们也可以是std::set )。 For each stage, I choose some (usually 5-10) of them and build a union vector. 对于每个阶段,我都选择其中的一些(通常为5-10个)并建立并集向量。 Than I save the length of the union vector and choose some other vectors. 比我保存联合向量的长度,然后选择其他一些向量。 This will be done for several thousand times. 这将完成数千次。 In the end I'm only interested in the length of the shortest union vector. 最后,我只对最短联合向量的长度感兴趣。

Small example: 

V1: {0, 4, 19, 40}
V2: {2, 4, 8, 9, 19}
V3: {0, 1, 2, 4, 40}
V4: {9, 10} 

// The Input Vectors V1, V2 … are always sorted and unique (could also be an std::set) 

Choose V1 , V3; 
Union Vector = {0, 1, 2, 4, 19, 40} -> Size = 6; 

Choose V1, V4; 
Union Vector = {0,4, 9, 10, 19 ,40} -> Size = 6; 

… and so on … 

At the moment I use std::set_union but I'm sure there must be a faster way. 目前,我使用std::set_union但是我敢肯定必须有一个更快的方法。

vector< vector<uint64_t>> collection; 
vector<uint64_t> chosen; 

for(unsigned int i = 0; i<chosen->size(); i++) {
    set_union(collection.at(choosen.at(i)).begin(),
              collection.at(choosen.at(i)).end(),
              unionVector.begin(),
              unionVector.end(),
              back_inserter(unionVectorTmp));
    unionVector.swap(unionVectorTmp);
    unionVectorTmp.clear();
}

I'm grateful for every reference. 感谢您的参考。

EDIT 27.04.2017 A new Idea: 编辑27.04.2017一个新的想法:

     unordered_set<unsigned int> unionSet;
     unsigned int counter = 0;

     for(const auto &sel : selection){
        for(const auto &val : sel){
            auto r = unionSet.insert(val);
            if(r.second){
                counter++;
            }
        }
    }

If they're sorted you can roll your own thats O(N+M) in runtime. 如果对它们进行了排序,则可以在运行时滚动自己的thats O(N + M)。 Otherwise you can use a hashtable with similar runtime 否则,您可以使用具有类似运行时的哈希表

The de facto way in C++98 is set_intersection , but with c++11 (or TR1) you can go for unordered_set , provided the initial vector is sorted, you will have a nice O(N) algorithm. 在C ++ 98中,事实上的方法是set_intersection ,但是对于c ++ 11(或TR1),您可以使用unordered_set ,只要对初始向量进行了排序,您将拥有一个不错的O(N)算法。

  1. Construct an unordered_set out of your first vector 根据第一个向量构造一个unordered_set
  2. Check if the elements of your 2nd vector are in the set 检查第二向量的元素是否在集合中

Something like that will do: 这样的事情会做:

std::unordered_set<int> us(std::begin(v1), std::end(v1));
auto res = std::count_if(std::begin(v2), std::end(v2), [&](int n) {return us.find(n) != std::end(us);}

There's no need to create the entire union vector. 无需创建整个联合矢量。 You can count the number of unique elements among the selected vectors by keeping a list of iterators and comparing/incrementing them appropriately. 您可以通过保留迭代器列表并适当地对它们进行比较/递增来计算所选向量中唯一元素的数量。

Here's the pseudo-code: 这是伪代码:

int countUnique(const std::vector<std::vector<unsigned int>>& selection)
{
  std::vector<std::vector<unsigned int>::const_iterator> iters;
  for (const auto& sel : selection) {
    iters.push_back(sel.begin());
  }
  auto atEnd = [&]() -> bool {
    // check if all iterators equal end
  };
  int count = 0;
  while (!atEnd()) {
    const int min = 0; // find minimum value among iterators

    for (size_t i = 0; i < iters.size(); ++i) {
      if (iters[i] != selection[i].end() && *iters[i] == min) {
        ++iters[i];
      }
    }

    ++count;
  }
  return count;
}

This uses the fact that your input vectors are sorted and only contain unique elements. 这利用了您的输入向量已排序并且仅包含唯一元素的事实。

The idea is to keep an iterator into each selected vector. 这个想法是让迭代器进入每个选定的向量。 The minimum value among those iterators is our next unique value in the union vector. 这些迭代器中的最小值是联合向量中的下一个唯一值。 Then we increment all iterators whose value is equal to that minimum. 然后,我们递增所有等于该最小值的迭代器。 We repeat this until all iterators are at the end of the selected vectors. 我们重复此过程,直到所有迭代器都位于所选向量的末尾。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM