C++: Differences between 2 arrays

Question

I have two unsorted random access arrays of a single simple element type (int/string/etc, so has all comparison operators, can be hashed, etc.). There should not be duplicate elements in either array.

Looking for a general algorthim that given these arrays A and B will tell me:

What elements are in both A and B
What elements are in A but not B
What elements are in B but not A

I guess I could do this with the set operators as below, but is there a faster solution (eg one that doesnt require me to build two sorted sets)?

r1 = std::set_intersection(a,b);
r2 = std::set_difference(a,b);
r3 = std::set_difference(b,a);

Answer 1

Something like the following algorithm will run O(|A|+|B|) (assuming O(1) behavior from unordered_map ):

Let list onlyA initially contain all of A, and lists onlyB and bothAB start out as empty.
Let hash-table Amap associate elements in onlyA with its corresponding iterator in onlyA .
For each element b in B
- If b finds a corresponding iterator ai in Amap
  - Add b to bothAB
  - Remove b from onlyA using ai
- Otherwise, add b to onlyB

At the end of the above algorithm,

onlyA contains elements in A but not in B,
onlyB contains elements in B but not in A,
bothAB contains elements in both A and B.

Below is an implementation of the above. The result is returned as a tuple < onlyA , onlyB , bothAB >.

template <typename C>
auto venn_ify (const C &A, const C &B) ->
    std::tuple<
        std::list<typename C::value_type>,
        std::list<typename C::value_type>,
        std::list<typename C::value_type>
    >
{
    typedef typename C::value_type T;
    typedef std::list<T> LIST;
    LIST onlyA(A.begin(), A.end()), onlyB, bothAB;
    std::unordered_map<T, typename LIST::iterator> Amap(2*A.size());
    for (auto a = onlyA.begin(); a != onlyA.end(); ++a) Amap[*a] = a;
    for (auto b : B) {
        auto ai = Amap.find(b);
        if (ai == Amap.end()) onlyB.push_back(b);
        else {
            bothAB.push_back(b);
            onlyA.erase(ai->second);
        }
    }
    return std::make_tuple(onlyA, onlyB, bothAB);
}

Answer 2

First, it's not clear from your question whether you mean std::set when you speak of sorted sets. If so, then your first reaction should be to use std::vector , if you can, on the original vectors. Just sort them, and then:

std::vector<T> r1;
std::set_intersection( a.cbegin(), a.cend(), b.cbegin(), b.cend(), std::back_inserter( r1 ) );

And the same for r2 and r3 .

Beyond that, I doubt that there's much you can do. Just one loop might improve things some:

std::sort( a.begin(), a.end() );
std::sort( b.begin(), b.end() );
onlyA.reserve( a.size() );
onlyB.reserve( b.size() );
both.reserve( std::min( a.size(), b.size() ) );
auto ita = a.cbegin();
auto enda = a.cend();
auto itb = b.cbegin();
auto endb = b.cend();
while ( ita != enda && itb != endb ) {
    if ( *ita < *itb ) {
        onlyA.push_back( *ita );
        ++ ita;
    } else if ( *itb < *ita ) {
        onlyB.push_back( *itb );
        ++ itb;
    } else {
        both.push_back( *ita );
        ++ ita;
        ++ itb;
    }
}
onlyA.insert( onlyA.end(), ita, enda );
onlyB.insert( onlyB.end(), itb, endb );

The reserve could make a difference, and unless most of the elements end up in the same vector, probably won't cost much extra memory.

Answer 3

You can do this in linear time by putting the elements of A into an unordered_map where the elements from A are the keys. The check if the elements of B in keys in the map.

C++: Differences between 2 arrays

Question

3 answers

solution1
3 2014-08-21 16:10:09

solution2
3 ACCPTED 2014-08-21 17:16:15

solution3
-1 2014-08-21 15:32:37

C++: Differences between 2 arrays

Question

3 answers

solution1 3 2014-08-21 16:10:09

solution2 3 ACCPTED 2014-08-21 17:16:15

solution3 -1 2014-08-21 15:32:37

solution1
3 2014-08-21 16:10:09

solution2
3 ACCPTED 2014-08-21 17:16:15

solution3
-1 2014-08-21 15:32:37