简体   繁体   English

在保留原始顺序的同时擦除/删除多个 std::vector 元素的最有效方法?

[英]Most efficient way of erasing/deleting multiple std::vector elements while retaining original order?


i have a std::vector<int> and a second container holding iterators or indexes (no keys, i want constant access to the element) to this vector for deletion purposes.我有一个std::vector<int>和第二个容器,其中包含迭代器或索引(没有键,我想要对元素的持续访问)到这个向量以进行删除。 Let's assume i have a vector of 1000 elements and want to erase 200 of them.假设我有一个包含 1000 个元素的向量,并且想要删除其中的 200 个。 The order of the non-removed elements should be the same after the deletion operations like before.未删除元素的顺序应与之前的删除操作后相同。

One more thing i missed in the first version of my question: the values are unique .我在问题的第一个版本中遗漏的另一件事是:值是唯一的 They are identities.他们是身份。

How would you do that in a safe (regarding the stl rules) and efficient manner (the decision for a vector shall be final)?您将如何以安全(关于 stl 规则)和有效的方式(向量的决定将是最终的)做到这一点?

Possibilities or Methods i thought about:我想到的可能性方法

  • the erase-remove idiom (http://en.wikipedia.org/wiki/Erase-remove_idiom): originally for the deletion of elements which fulfill a condition (including linear search) but i think with ranges of size 1 this method could be used to with already given iterators and a dummy condition.擦除删除成语(http://en.wikipedia.org/wiki/Erase-remove_idiom):最初用于删除满足条件(包括线性搜索)的元素,但我认为对于大小为 1 的范围,此方法可能是用于已经给定的迭代器和虚拟条件。 Question: is the original order of elements kept and is it more performant than the last method?问题:是否保留了元素的原始顺序,它是否比最后一种方法性能更高?
  • loop over the indexes and erase the elements with the use of vector.erase(vector.begin()+index+offset) while keeping the indexes removed in a container for calculating the offset.循环遍历索引并使用vector.erase(vector.begin()+index+offset)擦除元素,同时将索引保留在容器中以计算偏移量。 This offset could be determined for every remove iteration with the use of a std::lower_bound n the container of already removed elements.可以使用std::lower_bound n 已删除元素的容器为每次删除迭代确定此偏移量。 The problem: A lot of binary_searches for getting the offset and a lot of move operations because of random-location-deletion.问题:由于随机位置删除,大量 binary_search 用于获取偏移量和大量移动操作。
  • At the moment I'm doing the following: get all the iterators for the elements to remove.目前我正在执行以下操作:获取要删除的元素的所有迭代器。 Sort them in descending order according to the location in the vector and loop over them for the final deletion with vector.erase .根据向量中的位置按降序对它们进行排序,并使用vector.erase它们以进行最终删除。 Now I'm not invalidating any iterator and there are no vector rearrange-operations except for the deletion itself.现在我没有使任何迭代器失效,并且除了删除本身之外没有向量重新排列操作。 The problem: a lot of sorting问题:大量排序

So, how would you tackle this?那么,你将如何解决这个问题? Any new ideas?有什么新想法吗? Any recommendations?有什么建议吗?

Thanks for your input.感谢您的输入。

Sascha萨沙

Edit / Update / Own results: I implemented the erase-remove idiom , which was also mentioned by KennyTM, with a predicate based on the lookup in a boost::dynamic_bitset and it's insanely fast .编辑/更新/自己的结果:我使用了一个基于 boost::dynamic_bitset 中查找谓词实现了擦除-删除习语,它也被 KennyTM 提到过,而且速度非常快 Furthermore i tried PigBen's move-and-truncate method (also mentioned by Steve Jessop) which is also accessing the bitset in it's while-loop.此外,我尝试了PigBen 的 move-and-truncate 方法(Steve Jessop 也提到过),它也在它的 while 循环中访问位集。 Both seem to be equally fast with my kind of data.对于我的数据,两者似乎同样快。 I tried to delete 100 of 1000 Elements (unsigned ints), did this 100 deletes 1M times and there was no significant difference.我试图删除 100 个元素中的 100 个(无符号整数),这 100 个删除了 1M 次并且没有显着差异。 Because i think the stl-based erase-remove idiom is kinda more "natural, i'm choosing this method (argument was also mentioned by KennyTM).因为我认为基于 stl 的擦除-删除习语更“自然,所以我选择了这种方法(KennyTM 也提到了这个论点)。

How about looping through the vector, and for each element that needs to be removed, copy the next element that doesn't need to be removed in to that position.如何循环遍历向量,对于每个需要删除的元素,将下一个不需要删除的元素复制到该位置。 Then when you get to the end, truncate it.然后当你到达最后时,截断它。

int last = 0;
for(int i=0; i<vec.size(); ++i, ++last)
{
   while(needs_to_be_removed(i))
      ++i;
   if(i >= vec.size()) break;

   vec[last] = vec[i];   
}

vec.resize(last);

In <algorithm> there is a remove_if function which squeezes all values not removed to the front maintaining the order.<algorithm>有一个remove_if函数,它将所有未删除的值remove_if到最前面以保持顺序。 This works if those 200 elements can be purely determined by the values, not index.如果这 200 个元素可以完全由值而不是索引确定,则此方法有效。

This is essentially the Erase-remove idiom you have linked to.这本质上是您已链接到的擦除删除习语。 remove_if is guaranteed to perform O(N) comparisons (and at most O(N) copyings), which would be more efficient than sorting (O(N log N)), although your last option doesn't actually require sorting if the indices are determined from values (just scan in the reversed direction while copying). remove_if保证执行 O(N) 次比较(最多 O(N) 次复制),这比排序(O(N log N))更有效,尽管如果索引,您的最后一个选项实际上不需要排序由值确定(复印时只需反向扫描)。

Nevertheless, using remove_if (if you can) is better than the other 2 options because the implementation has already been written for you, so there's less chance of logical error and conveys better what (not how ) to do.尽管如此,使用remove_if (如果可以)比其他 2 个选项更好,因为已经为您编写了实现,因此出现逻辑错误的可能性较小,并且可以更好地传达要做什么(而不是如何做)。

First thing is, don't call erase more times than you have to, because for a vector it shuffles all the later elements down, giving the whole operation an Ω(n*m) worst case run time (n the size of the vector, m the size of the list of indexes to remove).第一件事是,不要多次调用erase ,因为对于一个向量,它会将所有后面的元素打乱,给整个操作一个 Ω(n*m) 最坏情况运行时间(n 向量的大小, m 要删除的索引列表的大小)。

I think the first thing I'd try would be similar to your current code:我认为我会尝试的第一件事类似于您当前的代码:

  • sort the indexes对索引进行排序
  • create a new vector of size n - m创建一个大小为 n - m 的新向量
  • iterate over the original vector, copying indexes[0] elements, skipping an element, then copying indexes[1] - indexes[0] - 1 elements, skip an element, and so on.迭代原始向量,复制indexes[0]元素,跳过一个元素,然后复制indexes[1] - indexes[0] - 1元素,跳过一个元素,依此类推。
  • swap the original vector with the new one.将原始向量与新向量swap

You might be able to do the third step with remove_copy_if and a predicate which contains state (counting how many items it has copied and how far it is through the sorted list of indexes), but for extremely tedious and obscure reasons this isn't guaranteed to work (algorithm predicates with mutable state are problematic, it seems to be the consensus that the standard doesn't guarantee that the same copy of the predicate is used throughout the algorithm).您也许可以使用remove_copy_if和一个包含状态的谓词(计算它复制了多少项以及它通过索引排序列表的距离)来执行第三步,但是由于极其繁琐和晦涩的原因,这不能保证工作(具有可变状态的算法谓词是有问题的,似乎一致认为标准不保证在整个算法中使用谓词的相同副本)。 So I really don't advise trying it, but it might help to bear in mind that what you're writing basically is a modified version of remove_copy_if .所以我真的不建议尝试它,但记住你正在编写的内容基本上是remove_copy_if的修改版本可能会有所帮助。

You could avoid the second step using a back_inserter rather than presizing the vector, although you'd presumably still reserve the space in advance.您可以使用back_inserter而不是预先调整向量的大小来避免第二步,尽管您可能仍会提前保留空间。

[Edit: come to think of it, why am I copying anything? [编辑:想想看,我为什么要复制任何东西? Rather than implementing a modified remove_copy_if , implement a modified remove_if , and just copy to an earlier point in the vector.与其实现修改后的remove_copy_if ,不如实现一个修改后的remove_if ,然后复制到向量中较早的点。 Then erase / resize at the end.然后在最后erase / resize I wouldn't worry about the O(m log m) sort of the indexes until proven to be a problem, because it's unlikely to be significantly slower than the Ω(m) operation to read all the values to be removed, and store them in some kind of container.在被证明是一个问题之前,我不会担心O(m log m)排序的索引,因为读取所有要删除的值并存储它们不太可能比 Ω(m) 操作慢得多在某种容器中。 Then, using this container in the predicate to remove_if may or may not be O(1) .然后,在谓词中使用这个容器remove_if可能是也可能不是O(1) Sorting might turn out faster for plausible values of m .]对于合理的m值,排序可能会更快。]

You can copy all elements of the vector to a list unless the index in your second container, and then back to a vector.您可以将向量的所有元素复制到列表中,除非第二个容器中的索引,然后再复制回向量。 Even with your algorithm of going from the end of the vector to the front, there's a lot of work going on behind the scenes in your vector.即使你的算法是从向量的末尾到前面,在你的向量的幕后还有很多工作要做。

Make your second container a map so it keeps the indeces sorted for you automatically.使您的第二个容器成为地图,以便自动为您排序索引。

edit:编辑:

To respond to the comment回复评论

The cost of maintaining a map is worst case the same as maintaining another structure (list or vector) and then sorting it.在最坏的情况下,维护地图的成本与维护另一个结构(列表或向量)然后对其进行排序的成本相同。 If you're already doing that, you might as well keep it as a map.如果您已经这样做了,您不妨将其作为地图保留。 It doesn't make sense to complain about the overhead of a map vs. the overhead of sorting a list.抱怨地图的开销与排序列表的开销是没有意义的。

As for the performance of my suggested algorithm, if m is the number of elements to be deleted, and n is the total number of elements then it results in O(n - m).至于我建议的算法的性能,如果 m 是要删除的元素数,n 是元素总数,那么它的结果是 O(n - m)。

Of course, this is mostly just humoring your attempt to optimize with a vector.当然,这主要是为了满足您使用向量进行优化的尝试。

1 - You shouldn't be using a vector if you want to do random access deletes. 1 - 如果要进行随机访问删除,则不应使用向量。 That's not what they're good at, use a list if at all possible.这不是他们擅长的,如果可能的话,使用列表。 And since you seem to be much more interested in relative order rather than absolute index, I am wondering why a vector is needed at all.而且由于您似乎对相对顺序而不是绝对索引更感兴趣,我想知道为什么根本需要向量。 If you gave the entire problem, there's probably a common solution to let you use the most efficient data structure to solve it.如果你给出了整个问题,那么可能有一个通用的解决方案可以让你使用最有效的数据结构来解决它。

2 - Instead of maintaining a second data structure, mark elements that need to be deleted directly in their container. 2 - 无需维护第二个数据结构,而是在其容器中直接标记需要删除的元素。 A trivial way is instead using a container< T > use a container< std::pair< T, char > > and use the char to keep track of the element status.一种简单的方法是使用容器< T > 使用容器< std::pair< T, char > > 并使用 char 来跟踪元素状态。

If you do 1 and 2, you remove all copying completely and get a much more efficient implementation.如果您执行 1 和 2,您将完全删除所有复制并获得更有效的实现。

If you have a (eg unordered) set of indices that you want to erase, you can use this:如果您有一组(例如无序)索引要擦除,则可以使用以下命令:

template <typename Type>
void erase_indices(
        const std::unordered_set<size_t>& indices_to_erase,
        std::vector<Type>& vec) {
    std::vector<bool> erase_index(vec.size(), false);
    for (const size_t i: indices_to_erase) {
        erase_index[i] = true;
    }
    std::vector<bool>::const_iterator it_to_erase = erase_index.cbegin();
    typename std::vector<Type>::iterator it_erase_from = std::remove_if(
        vec.begin(), vec.end(),
        [&it_to_erase](const Type&) -> bool {
          return *it_to_erase++ == true;
        }
    );
    vec.erase(it_erase_from, vec.end());
}

It is the fastest solution that came to my mind.这是我想到的最快的解决方案。 You need C++11 , though.不过,您需要C++11 Usage example to erase elements at index 2 and 5:擦除索引 2 和 5 处元素的用法示例:

constexpr size_t num = 10u;
std::vector<int> vec(num);
std::iota(vec.begin(), vec.end(), 0);

std::unordered_set<size_t> indices_to_erase;
indices_to_erase.insert(2u);
indices_to_erase.insert(5u);

erase_indices(indices_to_erase, vec);

Before:前:

0 1 2 3 4 5 6 7 8 9

After:后:

0 1 3 4 6 7 8 9

Edit: If want to be more flexible regarding type of container that hold the indices to erase:编辑:如果想要更灵活地处理包含要擦除的索引的容器类型:

template <typename Type, typename Container>
void erase_indices(
        const Container& indices_to_erase,
        std::vector<Type>& vec) {
    typedef typename Container::value_type IndexType;
    static_assert(std::is_same<IndexType, std::size_t>::value,
        "Indices to be erased have to be of type std::size_t");
    std::vector<bool> erase_index(vec.size(), false);
    for (const IndexType idx_erase: indices_to_erase) {
        erase_index[idx_erase] = true;
    }
    std::vector<bool>::const_iterator it_to_erase = erase_index.cbegin();
    typename std::vector<Type>::iterator it_erase_from = std::remove_if(
        vec.begin(), vec.end(),
        [&it_to_erase](const Type&) -> bool {
          return *it_to_erase++ == true;
        }
    );
    vec.erase(it_erase_from, vec.end());
}

Now you can use any kind of container from the Containers Library to provide the indices to be erased as long as the value_type of that container is std::size_t .现在,您可以使用容器库中的任何类型的容器来提供要擦除的索引,只要该容器的value_typestd::size_t Usage remains the same.用法保持不变。

Elements of what?什么元素? Maybe I'm taking your post to seriously but if you have a vector of 1000 elements why not mark the ones that are not valid anymore and do away with erasing in the first place.也许我正在认真对待你的帖子,但如果你有一个包含 1000 个元素的向量,为什么不标记那些不再有效的元素并首先取消擦除。 Obviously I'm making an assumption here that your elements are not demanding a lot of memory.显然,我在这里假设您的元素不需要大量内存。

I only bring this up because you seem to be concerned with speed.我提出这个只是因为你似乎关心速度。 If the suggestions already given don't do the trick maybe this idea is worth a thought!如果已经给出的建议不起作用,也许这个想法值得考虑! In essence speed things up by not doing the operation in the first place.从本质上讲,通过不首先进行操作来加快速度。

I've written a function, based on Benjamin Lindley answer https://stackoverflow.com/a/4115582/2835054 .我写了一个函数,基于 Benjamin Lindley 的回答https://stackoverflow.com/a/4115582/2835054

#include <iostream>
#include <algorithm>
#include <vector>

template <typename elementType, typename indexType>
void remove_multiple_elements_from_vector(std::vector<elementType> &vector,
std::vector<indexType> &indexes)
{
    // 1. indexType is any integer.
    // 2. elementType is any type.
    // 3. Indexes should be unique.
    // 4. The largest index inside indexes shouldn't be larger than
    //    the largetst index in the vector.
    // 5. Indexes should be sorted in ascending order
    //    (it is done inside function).
    std::sort(indexes.begin(), indexes.end());
    indexType currentIndexInIndexesVector = 0;
    indexType last = 0;
    for(indexType i=0; i<vector.size(); ++i, ++last)
    {
       while(indexes[currentIndexInIndexesVector] == i)
       {
          ++i;
          ++currentIndexInIndexesVector;
       }
       if(i >= vector.size()) break;

       vector[last] = vector[i];   
    }

    vector.resize(last);
}


int main()
{
    std::vector<int> vector = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    std::vector<int> indexes = {0, 10, 5};

    for (auto &vectorElement : vector)
    {
        std::cout << vectorElement << " ";
    }    
    std::cout << "\n";

    remove_multiple_elements_from_vector<int, int>(vector, indexes);

    for (auto &vectorElement : vector)
    {
        std::cout << vectorElement << " ";
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 比较向量中元素的最有效和最快方法? - the most efficient and fastest way to compare elements in a vector? 复制在std向量中只出现一次的元素的最有效方法是什么? - What is the most efficient way of copying elements that occur only once in a std vector? 将数据从std :: vector传递到std :: valarray的最有效方法 - Most efficient way to pass data to a std::valarray from a std::vector 什么是以升序打印矢量的所有元素的最有效方法,直到它没有重复的空? - What's the most efficient way to print all elements of vector in ascending order till it's empty without duplicates? 删除问题并将元素添加到std :: vector的问题 - Problems erasing and adding elements to std::vector 使用索引擦除 std::vector 中的元素 - Erasing elements in std::vector by using indexes 从std :: vector创建tensorflow :: tensor的最有效方法 - most efficient way to create tensorflow::tensor from std::vector 在2D std :: vector中移动行/列的最有效方法 - Most efficient way to move row/column in 2D std::vector 将结构化数据投影到std :: vector的最有效方法是什么? - What is the most efficient way to project structured data to a std::vector? 将一个std :: vector附加到另一个stnd :: vector的最有效方法是什么? - What is the most efficient way to append one std::vector to the end of another?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM