简体   繁体   English

为什么要使用ArrayList(int capacity)呢?

[英]Why bother using ArrayList(int capacity)?

So pretty much every question related to capacity in ArrayList is how to use it or (oddly) access it and I am quite familiar with that information. 因此几乎与ArrayList中的容量相关的每个问题都是如何使用它或(奇怪地)访问它,我对这些信息非常熟悉。 What I am interested in whether it is actually worth using the ArrayList constructor that sets capacity if you happen to know or have a rough idea how many items will be in the ArrayList? 我感兴趣的是,如果您碰巧知道或者大致了解ArrayList中有多少项,那么是否真的值得使用设置容量的ArrayList构造函数?

Are there any comprehensive benchmarks comparing how long it takes to just use naive adding of elements to an ArrayList versus pre-setting the capacity of an ArrayList? 是否有任何全面的基准测试比较使用天然添加元素到ArrayList所需的时间与预先设置ArrayList的容量相比?

Obviously for any specific application you'd have to test any performance adjustments to determine if they are in fact optimizations (and if they are in fact necessary), but there are some times that setting the capacity explicitly can be worthwhile. 显然,对于任何特定的应用程序,您必须测试任何性能调整以确定它们是否实际上是优化(并且如果它们实际上是必要的),但有时候明确设置容量是值得的。 For example: 例如:

  • You're creating a very large number of array-lists, most of which will be very small. 您正在创建大量的数组列表,其中大部分都非常小。 In this case, you might want to set the initial capacity very low, and/or to trim the capacity whenever you're done populating a given array. 在这种情况下,您可能希望将初始容量设置得非常低,和/或在填充给定数组时调整容量。 (In this case, the optimization is less a matter of speed than of memory usage. But note that the list itself has memory overhead, as does the array it contains, so in this sort of situation it's likely to be better to redesign in such a way as to have fewer lists.) (在这种情况下,优化不是速度问题而是内存使用问题。但请注意,列表本身具有内存开销,它包含的数组也是如此,所以在这种情况下,重新设计这样的情况可能会更好。一种减少列表的方法。)
  • You're creating an array-list of a very large known size, and you want the time to add each element to be very small (perhaps because each time you add an element, you have to send some response to an external data-source). 您正在创建一个非常大的已知大小的数组列表,并且您希望将每个元素添加到非常小的时间(可能因为每次添加元素时,您都必须向外部数据源发送一些响应)。 (The default geometric growth takes amortized constant time: every once in a while, a massive penalty is incurred, such that the overall average performance is completely fine, but if you care about individual insertions taken individually, that might not be good enough.) (默认的几何增长需要摊销的固定时间:每隔一段时间就会产生一次巨大的惩罚,这样整体平均表现就完全没问题,但是如果你关心单独进行单独插入,那可能就不够好了。)

I have nothing substantial to add to ruakh's answer, but here's a quick test function. 我没有任何重要的内容可以添加到ruakh的答案,但这是一个快速测试功能。 I keep a scrap project around for writing little tests like these. 我保留了一个废料项目来编写这样的小测试。 Adjust the sourceSize to something representative of your data, and you can get a rough idea of the magnitude of the effect. 将sourceSize调整为代表您数据的内容,您可以大致了解效果的大小。 As shown, I saw about a factor of 2 between them. 如图所示,我看到它们之间约为2。

import java.util.ArrayList;
import java.util.Random;

public class ALTest {
    public static long fill(ArrayList<Byte> al, byte[] source) {
        long start = System.currentTimeMillis();
        for (byte b : source) {
            al.add(b);
        }
        return System.currentTimeMillis()-start;
    }
    public static void main(String[] args) {
        int sourceSize = 1<<20; // 1 MB
        int smallIter = 50;
        int bigIter = 4;

        Random r = new Random();
        byte[] source = new byte[sourceSize];
        for (int i = 0;i<bigIter;i++) {
            r.nextBytes(source);
            {
                long time = 0;
                for (int j = 0;j<smallIter;j++) {
                    ArrayList<Byte> al = new ArrayList<Byte>(sourceSize);
                    time += fill(al,source);
                }
                System.out.print("With: "+time+"ms\t");
            }
            {
                long time = 0;
                for (int j = 0;j<smallIter;j++) {
                    ArrayList<Byte> al = new ArrayList<Byte>();
                    time += fill(al,source);
                }
                System.out.print("Without: "+time+"ms\t");
            }
            {
                long time = 0;
                for (int j = 0;j<smallIter;j++) {
                    ArrayList<Byte> al = new ArrayList<Byte>();
                    time += fill(al,source);
                }
                System.out.print("Without: "+time+"ms\t");
            }
            {
                long time = 0;
                for (int j = 0;j<smallIter;j++) {
                    ArrayList<Byte> al = new ArrayList<Byte>(sourceSize);
                    time += fill(al,source);
                }
                System.out.print("With: "+time+"ms");
            }
            System.out.println();
        }
    }
}

Output: 输出:

With: 401ms Without: 799ms  Without: 731ms  With: 347ms
With: 358ms Without: 744ms  Without: 749ms  With: 342ms
With: 348ms Without: 719ms  Without: 739ms  With: 347ms
With: 339ms Without: 734ms  Without: 774ms  With: 358ms

ArrayList internals uses simple arrays to store its elements, if the number of elements exceeds the capacity of the underlying array, a resize effort is need. ArrayList内部使用简单数组来存储其元素,如果元素数量超过底层数组的容量,则需要调整大小。 So, in the case you know how many items will your List contain, you can inform ArrayList to use an array of the needed size so the resize logic won't be needed or executed. 因此,如果您知道List包含多少项,则可以通知ArrayList使用所需大小的数组,这样就不需要或不执行调整大小逻辑。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM