简体   繁体   English

就地 C++ 设置交集

[英]In-place C++ set intersection

The standard way of intersecting two sets in C++ is to do the following:在 C++ 中将两个集合相交的标准方法是执行以下操作:

std::set<int> set_1;  // With some elements
std::set<int> set_2;  // With some other elements
std::set<int> the_intersection;  // Destination of intersect
std::set_intersection(set_1.begin(), set_1.end(), set_2.begin(), set_2.end(), std::inserter(the_intersection, the_intersection.end()));

How would I go about doing an in-place set intersection?我将如何进行就地设置交集? That is, I want set_1 to have the results of the call to set_intersection.也就是说,我希望 set_1 具有调用 set_intersection 的结果。 Obviously, I can just do a set_1.swap(the_intersection) , but this is a lot less efficient than intersecting in-place.显然,我可以只做一个set_1.swap(the_intersection) ,但这比原地相交效率低得多。

I think I've got it:我想我已经明白了:

std::set<int>::iterator it1 = set_1.begin();
std::set<int>::iterator it2 = set_2.begin();
while ( (it1 != set_1.end()) && (it2 != set_2.end()) ) {
    if (*it1 < *it2) {
        set_1.erase(it1++);
    } else if (*it2 < *it1) {
        ++it2;
    } else { // *it1 == *it2
            ++it1;
            ++it2;
    }
}
// Anything left in set_1 from here on did not appear in set_2,
// so we remove it.
set_1.erase(it1, set_1.end());

Anyone see any problems?任何人都看到任何问题? Seems to be O(n) on the size of the two sets.在两个集合的大小上似乎是 O(n)。 According to cplusplus.com , std::set erase(position) is amortized constant while erase(first,last) is O(log n).根据cplusplus.com , std::set erase(position) 是摊销常数,而 erase(first,last) 是 O(log n)。

You can easily go through set_1 , check each element to see if it exists in set_2 , and erase it if it doesn't.您可以轻松地通过set_1 ,检查每个元素以查看它是否存在于set_2 ,如果不存在则将其删除。 Since sets are sorted, you can compare them in linear time, and erasing an element using an iterator is amortized constant time .由于集合是排序的,您可以在线性时间内比较它们,并且使用迭代器擦除元素是分摊常数 time I wouldn't count on it being more efficient than what you started with though, benchmarking would be wise if it matters to you.我不会指望它比您开始时更有效,如果对您很重要,基准测试将是明智的。

It's not directly answers the question, but maybe someone find this helpful.它没有直接回答问题,但也许有人觉得这有帮助。

In case of std::vector it is not safe to use standard algorithm with set_1.begin() as output iterator (see below), while clang / gcc / microsoft implementations would work .在情况下std::vector它是不是安全使用标准算法set_1.begin()作为输出迭代器(见下文),而/ GCC / 微软的实现是可行的 Note, set_2 could be anything , not just a std::vector .请注意, set_2可以是任何东西,而不仅仅是std::vector

std::vector<int> set_1;  // With some elements
std::vector<int> set_2;  // With some other elements
auto end = std::set_intersection(
                     set_1.begin(), set_1.end(), 
                     set_2.begin(), set_2.end(), 
                     set_1.begin() // intersection is written in set_1
                    );
set_1.erase(end, set_1.end()); // erase redundant elements

Update :更新

Thanks to @Keith who found that C++ Standard (25.4.5.3) requires next:感谢@Keith,他发现 C++ 标准(25.4.5.3)需要下一个:

The resulting range shall not overlap with either of the original ranges

So what I initially proposed was wrong , but working solution in major STL implementations.所以我最初提出的建议是错误的,但在主要的 STL 实现中是可行的解决方案。 If you want to be on safe side and don't want extra allocations then copy implementation of your choice to you code base and use it instead of std::set_intersection .如果您想安全一点并且不想要额外的分配,那么将您选择的实现复制到您的代码库并使用它而不是std::set_intersection I don't really understand reasons for such restriction, please comment if you know the answer.我不太明白这种限制的原因,如果你知道答案,请发表评论。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM