简体   繁体   English

C ++,快速从另一个向量唯一的向量中删除元素

[英]C++, fast remove elements from vector unique to another vector

There are 2 unsorted vectors of int and vector of pairs int, int 有两个int的未分类向量和对int,int的向量

std::vector <int> v1;
std::vector <std::pair<int, float> > v2;

containing millions of items. 包含数百万件物品。

How to remove as fast as possible such items from v1, that are unique to v2.first (ie not included in v2.first)? 如何尽可能快地从v1中删除这些v2.first独有的项目(即不包含在v2.first中)?

Example: 例:

v1:  5 3 2 4 7 8
v2: {2,8} {7,10} {5,0} {8,9}
----------------------------
v1: 3 4

There are two tricks I would use to do this as quickly as possible: 我会尽快使用两种技巧来做到这一点:

  1. Use some sort of associative container (probably std::unordered_set ) to store all of the integers in the second vector to make it dramatically more efficient to look up whether some integer in the first vector should be removed. 使用某种关联容器(可能是std::unordered_set )来存储第二个向量中的所有整数,以便更有效地查找是否应该删除第一个向量中的某个整数。

  2. Optimize the way in which you delete elements from the initial vector. 优化从初始向量中删除元素的方式。

More concretely, I'd do the following. 更具体地说,我会做以下事情。 Begin by creating a std::unordered_set and adding all of the integers that are the first integer in the pair from the second vector. 首先创建一个std::unordered_set然后添加第二个向量中该对中第一个整数的所有整数。 This gives (expected) O(1) lookup time to check whether or not a specific int exists in the set. 这给出了(预期的)O(1)查找时间来检查集合中是否存在特定的int

Now that you've done that, use the std::remove_if algorithm to delete everything from the original vector that exists in the hash table. 既然你已经这样做了,使用std::remove_if算法从哈希表中存在的原始vector中删除所有内容。 You can use a lambda to do this: 您可以使用lambda来执行此操作:

std::unordered_set<int> toRemove = /* ... */
v1.erase(std::remove_if(v1.begin(), v1.end(), [&toRemove] (int x) -> bool {
    return toRemove.find(x) != toRemove.end();
}, v1.end());

This first step of storing everything in the unordered_set takes expected O(n) time. 将所有内容存储在unordered_set中的第一步需要预期的O(n)时间。 The second step does a total of expected O(n) work by bunching all the deletes up to the end and making lookups take small time. 第二步通过将所有删除聚集到最后并使查找花费很少时间来完成预期的O(n)工作。 This gives a total of expected O(n)-time, O(n) space for the entire process. 这给出了整个过程的预期O(n) - 时间,O(n)空间的总和。

If you are allowed to sort the second vector (the pairs), then you could alternatively do this in O(n log n) worst-case time, O(log n) worst-case space by sorting the vector by the key, then using std::binary_search to check whether a particular int from the first vector should be eliminated or not. 如果你被允许对第二个向量(对)进行排序,那么你也可以在O(n log n)最坏情况时间,O(log n)最坏情况空间中通过按键对向量进行排序,然后使用std::binary_search检查是否应该删除第一个vector的特定int Each binary search takes O(log n) time, so the total time required is O(n log n) for the sorting, O(log n) time per element in the first vector (for a total of O(n log n)), and O(n) time for the deletion, giving a total of O(n log n). 每个二进制搜索需要O(log n)时间,因此所需的总时间为排序的O(n log n),第一个向量中每个元素的O(log n)时间(总计为O(n log n)) )和O(n)删除时间,总共给出O(n log n)。

Hope this helps! 希望这可以帮助!

Assuming that neither container is sorted and that sorting is actually too expensive or memory is scarce: 假设两个容器都没有排序,并且排序实际上太昂贵或者内存不足:

v1.erase(std::remove_if(v1.begin(), v1.end(), 
                        [&v2](int i) { 
                         return std::find_if(v2.begin(), v2.end(), 
                                             [](const std::pair<int, float>& p) { 
                                                return p.first == i; }) 
                                != v2.end() }), v1.end());

Alternatively sort v2 on first and use a binary search instead. 或者firstv2进行排序,然后使用二进制搜索。 If there is enough memory use an unordered_set to sort the first of v2 . 如果有足够的内存,则使用unordered_setv2first一个进行排序。

Complete C++03 version: 完整的C ++ 03版本:

#include <iostream>
#include <vector>
#include <utility>
#include <algorithm>

struct find_func {
  find_func(int i) : i(i) {}

  int i;
  bool operator()(const std::pair<int, float>& p) {
    return p.first == i;
  }
};

struct remove_func {
  remove_func(std::vector< std::pair<int, float> >* v2) 
  : v2(v2) {}
  std::vector< std::pair<int, float> >* v2;
  bool operator()(int i) {
    return std::find_if(v2->begin(), v2->end(), find_func(i)) != v2->end();
  }
};


int main()
{
  // c++11 here
  std::vector<int> v1 = {5, 3, 2, 4, 7, 8};
  std::vector< std::pair<int, float> > v2 = {{2,8}, {7,10}, {5,0}, {8,9}};
  v1.erase(std::remove_if(v1.begin(), v1.end(), remove_func(&v2)), v1.end());

  // and here
  for(auto x : v1) {
    std::cout << x << std::endl;
  }

  return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM