I have two unsorted random access arrays of a single simple element type (int/string/etc, so has all comparison operators, can be hashed, etc.). There should not be duplicate elements in either array.
Looking for a general algorthim that given these arrays A and B will tell me:
I guess I could do this with the set operators as below, but is there a faster solution (eg one that doesnt require me to build two sorted sets)?
r1 = std::set_intersection(a,b);
r2 = std::set_difference(a,b);
r3 = std::set_difference(b,a);
Something like the following algorithm will run O(|A|+|B|) (assuming O(1) behavior from unordered_map
):
onlyA
initially contain all of A, and lists onlyB
and bothAB
start out as empty. Amap
associate elements in onlyA
with its corresponding iterator in onlyA
. B
bothAB
onlyA
using ai onlyB
At the end of the above algorithm,
Below is an implementation of the above. The result is returned as a tuple < onlyA
, onlyB
, bothAB
>.
template <typename C>
auto venn_ify (const C &A, const C &B) ->
std::tuple<
std::list<typename C::value_type>,
std::list<typename C::value_type>,
std::list<typename C::value_type>
>
{
typedef typename C::value_type T;
typedef std::list<T> LIST;
LIST onlyA(A.begin(), A.end()), onlyB, bothAB;
std::unordered_map<T, typename LIST::iterator> Amap(2*A.size());
for (auto a = onlyA.begin(); a != onlyA.end(); ++a) Amap[*a] = a;
for (auto b : B) {
auto ai = Amap.find(b);
if (ai == Amap.end()) onlyB.push_back(b);
else {
bothAB.push_back(b);
onlyA.erase(ai->second);
}
}
return std::make_tuple(onlyA, onlyB, bothAB);
}
First, it's not clear from your question whether you mean std::set
when you speak of sorted sets. If so, then your first reaction should be to use std::vector
, if you can, on the original vectors. Just sort them, and then:
std::vector<T> r1;
std::set_intersection( a.cbegin(), a.cend(), b.cbegin(), b.cend(), std::back_inserter( r1 ) );
And the same for r2
and r3
.
Beyond that, I doubt that there's much you can do. Just one loop might improve things some:
std::sort( a.begin(), a.end() );
std::sort( b.begin(), b.end() );
onlyA.reserve( a.size() );
onlyB.reserve( b.size() );
both.reserve( std::min( a.size(), b.size() ) );
auto ita = a.cbegin();
auto enda = a.cend();
auto itb = b.cbegin();
auto endb = b.cend();
while ( ita != enda && itb != endb ) {
if ( *ita < *itb ) {
onlyA.push_back( *ita );
++ ita;
} else if ( *itb < *ita ) {
onlyB.push_back( *itb );
++ itb;
} else {
both.push_back( *ita );
++ ita;
++ itb;
}
}
onlyA.insert( onlyA.end(), ita, enda );
onlyB.insert( onlyB.end(), itb, endb );
The reserve
could make a difference, and unless most of the elements end up in the same vector, probably won't cost much extra memory.
You can do this in linear time by putting the elements of A into an unordered_map where the elements from A are the keys. The check if the elements of B in keys in the map.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.