简体   繁体   English

很大的Java ArrayList的遍历时间很慢

[英]Very large Java ArrayList has slow traversal time

Solution: My ArrayList was filled with duplicates. 解决方案:我的ArrayList充满了重复项。 I modified my code to filter these out, which reduced running times to about 1 second. 我修改了代码以将其过滤掉,从而将运行时间减少到大约1秒。

I am working on a algorithms project that requires me to look at large amounts of data. 我正在做一个算法项目,该项目需要我查看大量数据。

My program has a potentially very large ArrayList (A) that has every element in it traversed. 我的程序有一个可能非常大的ArrayList(A),其中遍历了每个元素。 For each of these elements in (A), several other, calculated elements are added to another ArrayList (B). 对于(A)中的每个元素,其他几个计算出的元素将添加到另一个ArrayList(B)中。 (B) will be much, much larger than (A). (B)比(A)大得多。

Once my program has run through seven of these ArrayLists, the running time goes up to approximately 5 seconds. 一旦我的程序运行了这些ArrayList中的七个,运行时间将达到大约5秒。 I'm trying to get that down to < 1 second, if possible. 如果可能的话,我试图将其降低到<1秒。

I am open to different ways to traverse the ArrayList, as well as using a completely different data-structure. 我对遍历ArrayList以及使用完全不同的数据结构的各种方法持开放态度。 I don't care about the order of the values inside the lists, as long as I can go through all values, very fast. 只要我可以非常快地浏览所有值,我就不在乎列表中值的顺序。 I have tried a linked-list and it was significantly slower. 我尝试了一个链表,它的速度明显慢一些。

Here is a snippet of code, to give you a better understanding. 这是一段代码,以使您更好地理解。 The code tries to find all single-digit permutations of a prime number. 该代码尝试查找质数的所有一位数字排列。

public static Integer primeLoop(ArrayList current, int endVal, int size)
{        
    Integer compareVal = 0;
    Integer currentVal = 0;
    Integer tempVal = 0;
    int currentSize = current.size()-1;

    ArrayList next = new ArrayList();

    for(int k = 0; k <= currentSize; k++)
    {
        currentVal = Integer.parseInt(current.get(k).toString());
        for(int i = 1; i <= 5; i++)
        {                                
            for(int j = 0; j <= 9; j++)
            {
                compareVal = orderPrime(currentVal, endVal, i, j);
                //System.out.println(compareVal);

                if(!compareVal.equals(tempVal) && !currentVal.equals(compareVal))
                {     
                    tempVal = compareVal;
                    next.add(compareVal);

                    //System.out.println("Inserted: "+compareVal + "  with parent:  "+currentVal);

                    if(compareVal.equals(endVal))
                    {
                        System.out.println("Separation: " + size);
                        return -1;
                    }
                }
            }
        }
    }
    size++;
    //System.out.println(next);
    primeLoop(next, endVal, size); 
    return -1;
}

*Edit: Removed unnecessary code from snippet above. *编辑:从上面的代码段中删除了不必要的代码。 Created a currSize variable that stops the program from having to call the size of (current) every time. 创建了一个currSize变量,该变量使程序不必每次都调用(当前)的大小。 Still no difference. 仍然没有区别。 Here is an idea of how the ArrayList grows: 2, 29, 249, 2293, 20727, 190819, 这是有关ArrayList增长方式的想法:2,29,249,2293,20727,190819,

When something is slow, the typical advice is to profile it. 当出现问题时,通常的建议是对其进行分析。 This is generally wise, as it's often difficult to determine what's the cause of slowness, even for performance experts. 这通常是明智的,因为即使对于性能专家而言,通常也很难确定导致速度缓慢的原因。 Sometimes it's possible to pick out code that's likely to be a performance problem, but this is hit-or-miss. 有时可能会挑选出可能是性能问题的代码,但这是命中注定的。 There are some likely things in this code, but it's hard to say for sure, since we don't have the code for the orderPrime() and primeLoop() methods. 这段代码中可能包含一些内容,但是很难确定,因为我们没有orderPrime()primeLoop()方法的代码。

That said, there's one thing that caught my eye. 就是说,有一件事引起了我的注意。 This line: 这行:

    currentVal = Integer.parseInt(current.get(k).toString());

This gets an element from current , turns it into a string, parses it back to an int , and then boxes it into an Integer . 这从current获取一个元素,将其转换为字符串,将其解析为int ,然后将其装箱为Integer Conversion to and from String is pretty expensive, and it allocates memory, so it puts pressure on garbage collection. 与String之间的转换非常昂贵,并且它分配内存,因此对垃圾回收施加压力。 Boxing primitive int values to Integer objects also allocates memory, contributing to GC pressure. 将原始int值装箱到Integer对象还会分配内存,这会增加GC压力。

It's hard to say what the fix is, since you're using the raw type ArrayList for current . 很难说出解决方案是什么,因为您使用的是原始类型ArrayList作为current I surmise it might be ArrayList<Integer> , and if so, you could just replace this line with 我猜想它可能是ArrayList<Integer> ,如果是这样,您可以将这一行替换为

    currentVal = (Integer)current.get(k);

You should be using generics in order to avoid the cast. 您应该使用泛型以避免转换。 (But that doesn't affect performance, just the readability and type-safety of the code.) (但这不会影响性能,只会影响代码的可读性和类型安全性。)

If current doesn't contain Integer values, then it should. 如果current不包含Integer值,则应该包含。 Whatever it contains should be converted to Integer beforehand, instead of putting conversions inside a loop. 它包含的内容应事先转换为Integer ,而不是将转换放入循环中。

After fixing this, you are still left with boxing/unboxing overhead. 解决此问题后,您仍然剩下装箱/拆箱的开销。 If performance is still a problem, you'll have to switch from ArrayList<Integer> to int[] because Java collections cannot contain primitives. 如果仍然存在性能问题,则必须从ArrayList<Integer>切换到int[]因为Java集合不能包含基元。 This is inconvenient, since you'll have to implement your own list-like structure that simulates a variable-length array of int (or find a third party library that does this). 这很不方便,因为您必须实现自己的类似于列表的结构,该结构模拟int的可变长度数组(或找到执行此操作的第三方库)。

But even all of the above might not be enough to make your program run fast enough. 但是,即使是以上所有条件,也可能不足以使程序运行得足够快。 I don't know what your algorithm is doing, but it looks like it's doing linear searching. 我不知道您的算法在做什么,但是看起来它在做线性搜索。 There are a variety of ways to speed up searching. 有多种方法可以加快搜索速度。 But another commenter suggested binary search, and you said it wasn't allowed, so it's not clear what can be done here. 但是另一位评论者建议使用二进制搜索,您说这是不允许的,因此目前尚不清楚在这里可以做什么。

  1. Why you have this line 为什么你有这条线

    current.iterator(); current.iterator();

You don't use the iterator at all, you don't even have a variable for it. 您根本不用迭代器,甚至没有变量。 It's just waisting of time. 这只是时间的束缚。

  1. for(int k = 0; k <= current.size()-1; k++)

Instead of counting size every iteration, create value like: 无需像每次迭代那样计算大小,而是创建如下值:

int curSize = current.size() - 1;

And use it in loop. 并循环使用。

It can save some time. 可以节省一些时间。

Here is an idea of how the ArrayList grows: 2, 29, 249, 2293, 20727, 190819 这是一个关于ArrayList如何增长的想法:2,29,249,2293,20727,190819

Your next list grows too large, so it must contain duplicates: 您的next列表变得太大,因此它必须包含重复项:

  • 190_819 entries for 100_000 numbers? 190_819个100_000个号码的条目?
  • According to primes.utm.edu/howmany.html there are only 9,592 primes up to 100_000. 根据primes.utm.edu/howmany.html的数据,最多100_000只存在9,592个素数。

Getting rid of the duplicates will certainly improve your response times. 消除重复项肯定会改善您的响应时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM