简体   繁体   中英

What's the difference between std::merge and std::set_union?

The question is clear, my google- and cplusplus.com/reference-fu is failing me.

std::set_union will contain those elements that are present in both sets only once. std::merge will contain them twice.

For example, with A = {1, 2, 5}; B = {2, 3, 4} A = {1, 2, 5}; B = {2, 3, 4} :

  • union will give C = {1, 2, 3, 4, 5}
  • merge will give D = {1, 2, 2, 3, 4, 5}

Both work on sorted ranges, and return a sorted result.

Short example:

#include <algorithm>
#include <iostream>
#include <set>
#include <vector>

int main()
{
  std::set<int> A = {1, 2, 5};
  std::set<int> B = {2, 3, 4};

  std::vector<int> out;
  std::set_union(std::begin(A), std::end(A), std::begin(B), std::end(B),
                 std::back_inserter(out));
  for (auto i : out)
  {
    std::cout << i << " ";
  }
  std::cout << '\n';

  out.clear();
  std::merge(std::begin(A), std::end(A), std::begin(B), std::end(B),
             std::back_inserter(out));
  for (auto i : out)
  {
    std::cout << i << " ";
  }
  std::cout << '\n';
}

Output:

1 2 3 4 5 
1 2 2 3 4 5

std::merge keeps all elements from both ranges, equivalent elements from the first range preceding equivalent elements from the second range in the output. Where an equivalent elements appear in both ranges std::set_union takes only the element from the first range, otherwise each element is merged in order as with std::merge .

References: ISO/IEC 14882:2003 25.3.4 [lib.alg.merge] and 25.3.5.2 [lib.set.union].

This is the verification I suggested in the comment I posted to the accepted answer (ie that if an element is present in one of the input-sets N times, it will appear N times in the output of set_union - so set_union does not remove duplicate equivalent items in the way we would 'naturally' or 'mathematically' expect - if, however, both input-ranges contained a common item once only, then set_union would appear to remove the duplicate)

#include <vector>
#include <algorithm>
#include <iostream>
#include <cassert>

using namespace std;

void printer(int i) { cout << i << ", "; }

int main() {
    int mynumbers1[] = { 0, 1, 2, 3, 3, 4 }; // this is sorted, 3 is dupe
    int mynumbers2[] = { 5 };                // this is sorted


    vector<int> union_result(10);
    set_union(mynumbers1, mynumbers1 + sizeof(mynumbers1)/sizeof(int),
              mynumbers2, mynumbers2 + sizeof(mynumbers2)/sizeof(int),
              union_result.begin());
    for_each(union_result.begin(), union_result.end(), printer);

    return 0;
}

This will print: 0, 1, 2, 3, 3, 4, 5, 0, 0, 0,

To add to the previous answers - beware that the complexity of std::set_union is twice that of std::merge . In practise, this means the comparator in std::set_union may be applied to an element after it has been dereferenced, while with std::merge this is never the case.

Why may this be important? Consider something like:

std::vector<Foo> lhs, rhs;

And you want to produce a union of lhs and rhs :

std::set_union(std::cbegin(lhs), std::cend(lhs),
               std::cbegin(rhs), std::cend(rhs),
               std::back_inserter(union));

But now suppose Foo is not copyable, or is very expensive to copy and you don't need the originals. You may think to use:

std::set_union(std::make_move_iterator(std::begin(lhs)),
               std::make_move_iterator(std::end(lhs)),
               std::make_move_iterator(std::begin(rhs)),
               std::make_move_iterator(std::end(rhs)),
               std::back_inserter(union));

But this is undefined behaviour as there is a possibility of a moved Foo being compared: The correct solution is therefore:

std::merge(std::make_move_iterator(std::begin(lhs)),
           std::make_move_iterator(std::end(lhs)),
           std::make_move_iterator(std::begin(rhs)),
           std::make_move_iterator(std::end(rhs)),
           std::back_inserter(union));
union.erase(std::unique(std::begin(union), std::end(union), std::end(union));

Which has the same complexity as std::set_union .

std::merge merges all elements, without eliminating the duplicates, while std::set_union eliminates the duplicates. That is, the latter applies the rule of union operation of set theory .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM