简体   繁体   中英

Fast union building of multiple vectors in C++

I'm searching for a fast way to build a union of multiple vectors in C++.

More specifically: I have a collection of vectors (usually 15-20 vector s with several thousand unsigned integers; always sorted and unique so they could also be an std::set ). For each stage, I choose some (usually 5-10) of them and build a union vector. Than I save the length of the union vector and choose some other vectors. This will be done for several thousand times. In the end I'm only interested in the length of the shortest union vector.

Small example: 

V1: {0, 4, 19, 40}
V2: {2, 4, 8, 9, 19}
V3: {0, 1, 2, 4, 40}
V4: {9, 10} 

// The Input Vectors V1, V2 … are always sorted and unique (could also be an std::set) 

Choose V1 , V3; 
Union Vector = {0, 1, 2, 4, 19, 40} -> Size = 6; 

Choose V1, V4; 
Union Vector = {0,4, 9, 10, 19 ,40} -> Size = 6; 

… and so on … 

At the moment I use std::set_union but I'm sure there must be a faster way.

vector< vector<uint64_t>> collection; 
vector<uint64_t> chosen; 

for(unsigned int i = 0; i<chosen->size(); i++) {
    set_union(collection.at(choosen.at(i)).begin(),
              collection.at(choosen.at(i)).end(),
              unionVector.begin(),
              unionVector.end(),
              back_inserter(unionVectorTmp));
    unionVector.swap(unionVectorTmp);
    unionVectorTmp.clear();
}

I'm grateful for every reference.

EDIT 27.04.2017 A new Idea:

     unordered_set<unsigned int> unionSet;
     unsigned int counter = 0;

     for(const auto &sel : selection){
        for(const auto &val : sel){
            auto r = unionSet.insert(val);
            if(r.second){
                counter++;
            }
        }
    }

If they're sorted you can roll your own thats O(N+M) in runtime. Otherwise you can use a hashtable with similar runtime

The de facto way in C++98 is set_intersection , but with c++11 (or TR1) you can go for unordered_set , provided the initial vector is sorted, you will have a nice O(N) algorithm.

  1. Construct an unordered_set out of your first vector
  2. Check if the elements of your 2nd vector are in the set

Something like that will do:

std::unordered_set<int> us(std::begin(v1), std::end(v1));
auto res = std::count_if(std::begin(v2), std::end(v2), [&](int n) {return us.find(n) != std::end(us);}

There's no need to create the entire union vector. You can count the number of unique elements among the selected vectors by keeping a list of iterators and comparing/incrementing them appropriately.

Here's the pseudo-code:

int countUnique(const std::vector<std::vector<unsigned int>>& selection)
{
  std::vector<std::vector<unsigned int>::const_iterator> iters;
  for (const auto& sel : selection) {
    iters.push_back(sel.begin());
  }
  auto atEnd = [&]() -> bool {
    // check if all iterators equal end
  };
  int count = 0;
  while (!atEnd()) {
    const int min = 0; // find minimum value among iterators

    for (size_t i = 0; i < iters.size(); ++i) {
      if (iters[i] != selection[i].end() && *iters[i] == min) {
        ++iters[i];
      }
    }

    ++count;
  }
  return count;
}

This uses the fact that your input vectors are sorted and only contain unique elements.

The idea is to keep an iterator into each selected vector. The minimum value among those iterators is our next unique value in the union vector. Then we increment all iterators whose value is equal to that minimum. We repeat this until all iterators are at the end of the selected vectors.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM