简体   繁体   English

C ++:2个数组之间的差异

[英]C++: Differences between 2 arrays

I have two unsorted random access arrays of a single simple element type (int/string/etc, so has all comparison operators, can be hashed, etc.). 我有两个单个简单元素类型的未分类随机访问数组(int / string / etc,所以有所有比较运算符,可以进行哈希处理等)。 There should not be duplicate elements in either array. 任何一个数组中都不应该有重复的元素。

Looking for a general algorthim that given these arrays A and B will tell me: 寻找给出这些数组A和B的一般algorthim会告诉我:

  1. What elements are in both A and B A和B中有哪些元素
  2. What elements are in A but not B A中有哪些元素,但不是B.
  3. What elements are in B but not A B中有哪些元素但不是A.

I guess I could do this with the set operators as below, but is there a faster solution (eg one that doesnt require me to build two sorted sets)? 我想我可以用下面的set运算符来做这个,但是有一个更快的解决方案(例如,不需要我构建两个有序集合)?

r1 = std::set_intersection(a,b);
r2 = std::set_difference(a,b);
r3 = std::set_difference(b,a);

Something like the following algorithm will run O(|A|+|B|) (assuming O(1) behavior from unordered_map ): 类似下面的算法会运行O(| A | + | B |)(假设来自unordered_map O(1)行为):

  • Let list onlyA initially contain all of A, and lists onlyB and bothAB start out as empty. 让list onlyA最初包含所有A,并且只列出onlyBbothAB从空开始。
  • Let hash-table Amap associate elements in onlyA with its corresponding iterator in onlyA . 让哈希表Amap联营元素onlyA ,在其相应的迭代onlyA
  • For each element b in B 对于B每个元素b
    • If b finds a corresponding iterator ai in Amap 如果b在Amap中找到相应的迭代器ai
      • Add b to bothAB b添加到bothAB
      • Remove b from onlyA using ai 使用aionlyA删除b
    • Otherwise, add b to onlyB 否则,将b添加到onlyB

At the end of the above algorithm, 在上面的算法结束时,

  • onlyA contains elements in A but not in B, onlyA包含A中的元素但不包含B中的元素,
  • onlyB contains elements in B but not in A, onlyB包含B中的元素但不包含在A中,
  • bothAB contains elements in both A and B. 两个AB都包含A和B中的元素。

Below is an implementation of the above. 以下是上述的实现。 The result is returned as a tuple < onlyA , onlyB , bothAB >. 结果以元组< onlyAonlyBbothAB >的形式返回。

template <typename C>
auto venn_ify (const C &A, const C &B) ->
    std::tuple<
        std::list<typename C::value_type>,
        std::list<typename C::value_type>,
        std::list<typename C::value_type>
    >
{
    typedef typename C::value_type T;
    typedef std::list<T> LIST;
    LIST onlyA(A.begin(), A.end()), onlyB, bothAB;
    std::unordered_map<T, typename LIST::iterator> Amap(2*A.size());
    for (auto a = onlyA.begin(); a != onlyA.end(); ++a) Amap[*a] = a;
    for (auto b : B) {
        auto ai = Amap.find(b);
        if (ai == Amap.end()) onlyB.push_back(b);
        else {
            bothAB.push_back(b);
            onlyA.erase(ai->second);
        }
    }
    return std::make_tuple(onlyA, onlyB, bothAB);
}

First, it's not clear from your question whether you mean std::set when you speak of sorted sets. 首先,从你的问题来看,当你谈到排序集时,你的意思是std::set是不明确的。 If so, then your first reaction should be to use std::vector , if you can, on the original vectors. 如果是这样,那么你的第一反应应该是在原始向量上使用std::vector ,如果可以的话。 Just sort them, and then: 只需对它们进行排序,然后:

std::vector<T> r1;
std::set_intersection( a.cbegin(), a.cend(), b.cbegin(), b.cend(), std::back_inserter( r1 ) );

And the same for r2 and r3 . 对于r2r3

Beyond that, I doubt that there's much you can do. 除此之外,我怀疑你能做多少事情。 Just one loop might improve things some: 只需一个循环可以改善一些事情:

std::sort( a.begin(), a.end() );
std::sort( b.begin(), b.end() );
onlyA.reserve( a.size() );
onlyB.reserve( b.size() );
both.reserve( std::min( a.size(), b.size() ) );
auto ita = a.cbegin();
auto enda = a.cend();
auto itb = b.cbegin();
auto endb = b.cend();
while ( ita != enda && itb != endb ) {
    if ( *ita < *itb ) {
        onlyA.push_back( *ita );
        ++ ita;
    } else if ( *itb < *ita ) {
        onlyB.push_back( *itb );
        ++ itb;
    } else {
        both.push_back( *ita );
        ++ ita;
        ++ itb;
    }
}
onlyA.insert( onlyA.end(), ita, enda );
onlyB.insert( onlyB.end(), itb, endb );

The reserve could make a difference, and unless most of the elements end up in the same vector, probably won't cost much extra memory. reserve可以产生影响,除非大多数元素最终都在同一个向量中,否则可能不会花费太多额外的内存。

You can do this in linear time by putting the elements of A into an unordered_map where the elements from A are the keys. 您可以通过将A的元素放入unordered_map(其中A中的元素是键)来以线性时间执行此操作。 The check if the elements of B in keys in the map. 检查地图中键中B的元素是否存在。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM