简体   繁体   English

快速算法从ArrayList中删除多个元素

[英]Fast algorithm to remove a number of elements from an ArrayList

Say an ArrayList is of size n. 假设ArrayList的大小为n。

In my case, I often need to remove from 1 to n elements with different indexes from an ArrayList. 在我的例子中,我经常需要从ArrayList中删除具有不同索引的1到n个元素。

By using visualvm profiler, I found the ArrayList.remove() took around 90% of the running time. 通过使用visualvm profiler,我发现ArrayList.remove()占用了大约90%的运行时间。

So I want to improve the performance of the removal. 所以我想提高删除的性能。 I wonder if it could be accelerated. 我想知道它是否可以加速。

Here is a minimal example: 这是一个最小的例子:

public void testArrayListRemove() {
        List list = new ArrayList();
        int[] indexes = new int[] { 1, 2, 4, 10, 100, 1000 };
        for (int i = 0; i < 100000; i++) {
            list.add(i);
        }
        for (int i = indexes.length - 1; i >= 0; i--) {
            list.remove(indexes[i]);
        }
    }

The idea I can think of is to exchange those to be removed elements to the end and remove them there so that ArrayList.remove() do not need to make system.arraycopy. 我能想到的想法是将要删除的元素交换到最后并将其删除,以便ArrayList.remove()不需要生成system.arraycopy。 I am not sure whether this will really work. 我不确定这是否真的有效。

Note: ArrayList.remove(i) when i is not the last element, it will perform a System.arraycopy to move elements. 注意:ArrayList.remove(i)当我不是最后一个元素时,它将执行System.arraycopy来移动元素。

It would be very appreciated if you can provide ideas to deal with my problem. 如果您能提供解决我的问题的想法,将非常感激。 You can either comment on my naive idea of exchanging elements to the end or maybe even better provide more advanced algorithms other than my idea. 您可以评论我最终交换元素的天真想法,或者甚至可以更好地提供除我的想法之外的更高级的算法。

Thanks. 谢谢。

You should take a look at GapList – a lightning-fast List implementation 你应该看看GapList - 一个闪电般快速的List实现

From the article: 来自文章:


Introduction to GapList GapList简介

To solve the issues brought out, we introduce GapList as another implementation of the java.util.List interface. 为了解决问题,我们引入了GapList作为java.util.List接口的另一个实现。 As main features, GapList provides 作为主要功能,GapList提供

  • Efficient access to elements by index 通过索引有效访问元素
  • Constant time insertion at head and tail of list 在列表的头部和尾部插入恒定时间
  • Exploit the locality of reference often seen in applications 利用应用程序中常见的引用位置

Let's see how GapList is implemented to offer these features. 让我们看看如何实现GapList来提供这些功能。

If we compare how the different kind of inserts are handled by ArrayList, we can quickly come up with a solution to guarantee fast insertion both at the beginning and at the end of the list. 如果我们比较ArrayList处理不同类型的插入的方式,我们可以快速提出一个解决方案,以保证在列表的开头和结尾快速插入。

Instead of moving all elements to gain space at index 0, we leave the existing elements in place and write the elements at the end of the allocated array if there is space left. 我们不是移动所有元素来获得索引0处的空间,而是将现有元素保留在原位,并在剩余空间的情况下将元素写入分配数组的末尾。 So we basically use the array as a kind of rotating buffer. 所以我们基本上使用数组作为一种旋转缓冲区。

GapList1

For accessing the elements in the right order, we have to remember the start position of the first element and use a modulo operation to calculate the physical index from the logical one: 为了以正确的顺序访问元素,我们必须记住第一个元素的起始位置,并使用模运算来计算逻辑元素的物理索引:

physIndex = (start + index) % capacity

To exploit the locality of reference, we allow a gap to be included in the storage of the list elements. 为了利用引用的局部性,我们允许在列表元素的存储中包含间隙。 The gap formed by the unused slots in the backing array can be anywhere in the list. 由后备阵列中未使用的插槽形成的间隙可以是列表中的任何位置。 There is at most one gap, but there can also be none. 最多只有一个差距,但也可能没有。

This gap helps you to take advantage of the locality of reference to the list, so if you add an element to the middle of the list, a subsequent addition to the middle will be fast. 这个差距可以帮助您利用列表的引用位置,因此如果您在列表的中间添加一个元素,则中间的后续添加将很快。

中间

If a GapList has no gap, one is created if needed. 如果GapList没有间隙,则根据需要创建一个间隙。 If the gap is at a wrong place, it is moved. 如果间隙位置错误,则移动。 But if the operations happen near to each other, only few data will have to be copied. 但如果操作发生在彼此附近,则只需要复制少量数据。

GapList also allows removal of elements at the beginning and at the end without any moving of elements. GapList还允许在开始和结束时删除元素而无需移动元素。

去掉

Removals in the middle are handled similar to insertions: an existing gap may be moved or vanish if no longer needed. 中间的移除处理类似于插入:如果不再需要,现有的间隙可能会移动或消失。


Here's a small sample code: 这是一个小示例代码:

package rpax.stackoverflow.q24077045;

import java.util.*;
import java.util.concurrent.ThreadLocalRandom;
import org.magicwerk.brownies.collections.GapList;

public class Q24077045 {

    static int LIST_SIZE = 500000;

    public static void main(String[] args) {
        long a1, b1, c1 = 0, a2, b2, c2 = 0;
        int[] indexes = generateRandomIndexes(10000);

        a2 = System.currentTimeMillis();
        List<Integer> l2 = testArrayListRemove2(indexes);
        if (l2.size() < 1)
            return;
        b2 = System.currentTimeMillis();
        c2 = b2 - a2;

        a1 = System.currentTimeMillis();
        List<Integer> l = testArrayListRemove(indexes);
        if (l.size() < 1)
            return;
        b1 = System.currentTimeMillis();
        c1 = b1 - a1;

        System.out.println("1 : " + c1);
        System.out.println("2 : " + c2);

        System.out.println("Speedup : "+ c1 * 1.00 / c2+"x");

    }

    static int[] generateRandomIndexes(int number) {
        int[] indexes = new int[number];
        for (int i = 0; i < indexes.length; i++)
        {
            indexes[i] = ThreadLocalRandom.current().nextInt(0, LIST_SIZE);
        }
        Arrays.sort(indexes);
        return indexes;
    }

    public static List<Integer> testArrayListRemove(int[] indexes) {
        List<Integer> list = new ArrayList<Integer>(LIST_SIZE);

        for (int i = 0; i < LIST_SIZE; i++)
            list.add(i);

        for (int i = indexes.length - 1; i >= 0; i--)
            list.remove(indexes[i]);
        return list;
    }

    public static List<Integer> testArrayListRemove2(int[] indexes) {

        List<Integer> list = GapList.create(LIST_SIZE);

        for (int i = 0; i < LIST_SIZE; i++)
            list.add(i);

        for (int i = indexes.length - 1; i >= 0; i--)
            list.remove(indexes[i]);
        return list;
    }

}

I my laptop is about 10x faster. 我的笔记本电脑快了大约10倍。 It seems to be a good alternative to ArrayList . 它似乎是ArrayList一个很好的替代品。

Disclaimer: This is not a performance analisis. 免责声明:这不是性能分析。 It is only an illustrative example. 这只是一个说明性的例子。

You can deal with the array and iterate through it: 您可以处理数组并迭代它:

Integer[] arr = list.toArray(new int[]{});

int[] newArr = new int[arr.length-indices.length];

Now you'd System.arrayCopy each continguous block of the array: 现在你需要System.arrayCopy数组的每个连续块:

for (int i=0;i<arr.length;i++) {
    for (int j : indexes) { // Should be 'indices' btw
        if (j == arr[i]) {
            // Array copy arr to newArr
            break;
        }
    }
}

Check out the list of datastructures here . 在这里查看数据结构列表。 Pick one depending on your requirements. 根据您的要求选择一个。 Like Guarev mentioned, a HashMap is probably your best bet. 像Guarev提到的那样,HashMap可能是你最好的选择。 Hashmaps have the advantage of a constant time for insert, search, and delete. Hashmaps具有插入,搜索和删除的恒定时间的优点。

ArrayLists are not a good structure for a storing a lot of data, as the memory usage quickly goes through the roof, and search/delete times get very expensive very quickly. ArrayLists不是用于存储大量数据的良好结构,因为内存使用很快就会出现,并且搜索/删除时间非常快。

ArrayList is not really a good data structure to do this operation. ArrayList实际上不是一个很好的数据结构来执行此操作。

I would suggest you to use the HashMap for this purpose, you can keep the key, value pair with the key as the indexes. 我建议您使用HashMap来实现此目的,您可以将密钥,值对与密钥保持为索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM