简体   繁体   English

在构造函数中设置Java Collection的大小更好吗?

[英]Is it better to set size of a Java Collection in constructor?

Is it better to pass the size of Collection to Collection constructor if I know the size at that point? 如果我知道当时的大小,最好将Collection的大小传递给Collection构造函数吗? Is the saving effect in regards to expanding Collection and allocating/re-allocating noticable? 在扩展Collection和分配/重新分配方面的节省效果是否显着?

What if I know minimal size of the Collection but not the upper bound. 如果我知道Collection最小大小但不知道上限,该怎么办。 It's still worth creating it at least with minimal size? 至少以最小的尺寸创建它仍然值得吗?

Different collections have different performance consequences for this, for ArrayList the saving can be very noticeable. 不同的集合对此有不同的性能影响,对于ArrayList而言,节省是非常明显的。

import java.util.*;
public class Main{
public static void main(String[] args){
  List<Integer> numbers = new ArrayList<Integer>(5);
  int max = 1000000;
  // Warmup
  for (int i=0;i<max;i++) {
    numbers.add(i);
  }

  long start = System.currentTimeMillis();
  numbers = new ArrayList<Integer>(max);
  for (int i=0;i<max;i++) {
    numbers.add(i);
  }
  System.out.println("Preall: "+(System.currentTimeMillis()-start));

  start = System.currentTimeMillis();
  numbers = new ArrayList<Integer>(5);
  for (int i=0;i<max;i++) {
    numbers.add(i);
  }
  System.out.println("Resizing: "+(System.currentTimeMillis()-start));

}
}

Result: 结果:

Preall: 26
Resizing: 58

Running with max set to 10 times the value at 10000000 gives: 如果将max设置为10000000的值的10倍,则得出:

Preall: 510
Resizing: 935

So you can see even at different sizes the ratio stays around the same. 因此,即使大小不同,您也可以看到比率保持不变。

This is pretty much a worst-case test but filling an array one element at a time is very common and you can see that there was a roughly 2*speed difference. 这几乎是最坏的测试,但是一次填充一个元素一个数组很常见,您会发现速度大约相差2倍。

All collections are auto-expanding. 所有集合都是自动扩展的。 Not knowing the bounds will not affect their functionality (until you run into other issues such as using all available memory etc), it may however affect their performance. 不知道边界不会影响其功能(直到遇到其他问题,例如使用所有可用内存等),但是可能会影响其性能。

With some collections. 有一些收藏。 Most notably the ArrayList, auto expanding is expensive as the whole underlying array is copied; 最值得注意的是ArrayList,因为要复制整个基础数组,所以自动扩展的成本很高。 array lists are default sized at 10 and then double in size each time they get to their maximum. 数组列表的默认大小为10,每次达到最大值时,其大小都会增加一倍。 So, say you know your arraylist will contain 110 objects but do not give it a size, the following copies will happen 因此,假设您知道您的arraylist将包含110个对象,但不指定其大小,则将发生以下复制

Copy 10 --> 20 复制10-> 20
Copy 20 --> 40 复制20-> 40
Copy 40 --> 80 复制40-> 80
Copy 80 --> 160 复制80-> 160

By telling the arraylist up front that it contains 110 items you skip these copies. 通过预先告诉arraylist它包含110个项目,您可以跳过这些副本。

An educated guess is better than nothing 有根据的猜测总比没有好

Even if you're wrong it doesn't matter. 即使您错了也没关系。 The collection will still autoexpand and you will still avoid some copies. 集合仍将自动展开,并且您仍将避免某些副本。 The only way you can decrease performance is if your guess is far far too large: which will lead to too much memory being allocated to the collection 降低性能的唯一方法是,如果您的猜测太大了:这将导致分配给集合的内存过多

OK, here's my jmh code: 好,这是我的jmh代码:

@OutputTimeUnit(TimeUnit.MICROSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 3, time = 1)
@Measurement(iterations = 3, time = 1)
@Fork(3)
public class Comparison
{
  static final int size = 1_000;
  @GenerateMicroBenchmark
  public List<?> testSpecifiedSize() {
    final ArrayList<Integer> l = new ArrayList(size);
    for (int i = 0; i < size; i++) l.add(1);
    return l;
  }

  @GenerateMicroBenchmark
  public List<?> testDefaultSize() {
    final ArrayList<Integer> l = new ArrayList();
    for (int i = 0; i < size; i++) l.add(1);
    return l;
  }
}

My results for size = 10_000 : 我的size = 10_000结果:

Benchmark             Mode Thr    Cnt  Sec         Mean   Mean error    Units
testDefaultSize       avgt   1      9    1       80.770        2.095  usec/op
testSpecifiedSize     avgt   1      9    1       50.060        1.078  usec/op

Results for size = 1_000 : size = 1_000结果:

Benchmark             Mode Thr    Cnt  Sec         Mean   Mean error    Units
testDefaultSize       avgt   1      9    1        6.208        0.131  usec/op
testSpecifiedSize     avgt   1      9    1        4.900        0.078  usec/op

My interpretation: 我的解释:

  • presizing has some edge on the default size; presizing对默认大小一些边缘;
  • the edge isn't that spectacular; 边缘不是那么壮观;
  • the absolute time spent on the task of adding to the list is quite insignificant. 花在添加到列表上的绝对时间是微不足道的。

My conclusion: 我的结论是:

Add the initial size if that makes you feel warmer around the heart, but objectively speaking, your customer is highly unlikely to notice the difference. 如果增加初始大小,则可以使您的心脏感到温暖,但客观地说,客户不太可能注意到差异。

In the rare cases when the size is well known (for example when filling a know number of elements into a new collection), it may be set for performance reasons. 在少数情况下,大小是众所周知的(例如,当将已知数量的元素填充到新集合中时),出于性能原因可以进行设置。

Most often it's better to ommit it and use the default constructor instead, leading to simpler and better understandable code. 通常,最好省略它,而改用默认构造函数,这样可以使代码更简单和更好理解。

For array-based collections re-sizing is a quite expensive operation. 对于基于数组的集合,重新调整大小是一项非常昂贵的操作。 That's why pass exact size for ArrayList is a good idea. 这就是为什么为ArrayList传递准确大小是一个好主意的原因。

If you set up size to a minimal size( MIN ) and then add to the collection MIN +1 elements, then you got re-sizing. 如果将大小设置为最小大小( MIN ),然后将MIN +1个元素添加到集合中,则需要重新调整大小。 ArrayList() invokes ArrayList(10) so if MIN is big enough then you get some advantage. ArrayList()调用ArrayList(10)因此如果MIN足够大,那么您将获得一些好处。 But the best way is to create ArrayList with expecting collection size. 但是最好的方法是使用期望的集合大小创建ArrayList

But possibly you prefer LinkedList because it has no any costs for adding elements (although list.get(i) have O(i) cost) 但是可能您更喜欢LinkedList,因为它没有添加元素的任何开销(尽管list.get(i)开销为O(i))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM