简体   繁体   English

HashSet、Vector、LinkedList 的最大大小

[英]Maximum size of HashSet, Vector, LinkedList

What is the maximum size of HashSet , Vector , LinkedList ? HashSetVectorLinkedList的最大大小是多少? I know that ArrayList can store more than 3277000 numbers.我知道ArrayList可以存储超过 3277000 个数字。

However the size of list depends on the memory (heap) size.但是列表的大小取决于内存(堆)大小。 If it reaches maximum the JDK throws an OutOfMemoryError .如果达到最大值,JDK 将抛出OutOfMemoryError

But I don't know the limit for the number of elements in HashSet , Vector and LinkedList .但我不知道HashSetVectorLinkedList中元素数量的限制。

There is no specified maximum size of these structures.这些结构没有指定的最大尺寸。

The actual practical size limit is probably somewhere in the region of Integer.MAX_VALUE (ie 2147483647, roughly 2 billion elements), as that's the maximum size of an array in Java.实际的实际大小限制可能在Integer.MAX_VALUE区域内(即 2147483647,大约 20 亿个元素),因为这是 Java 中数组的最大大小。

  • A HashSet uses a HashMap internally, so it has the same maximum size as that HashSet在内部使用HashMap ,因此它的最大大小与HashMap相同
    • A HashMap uses an array which always has a size that is a power of two, so it can be at most 2 30 = 1073741824 elements big (since the next power of two is bigger than Integer.MAX_VALUE ). HashMap使用的数组的大小始终为 2 的幂,因此它最多可以有 2 30 = 1073741824 个元素(因为下一个 2 的幂大于Integer.MAX_VALUE )。
    • Normally the number of elements is at most the number of buckets multiplied by the load factor (0.75 by default).通常元素的数量最多是桶的数量乘以负载因子(默认为 0.75)。 However , when the HashMap stops resizing, then it will still allow you to add elements, exploiting the fact that each bucket is managed via a linked list.然而,当HashMap停止调整大小时,它仍然允许您添加元素,利用每个存储桶通过链表管理的事实。 Therefore the only limit for elements in a HashMap / HashSet is memory.因此, HashMap / HashSet元素的唯一限制是内存。
  • A Vector uses an array internally which has a maximum size of exactly Integer.MAX_VALUE , so it can't support more than that many elements Vector在内部使用一个数组,该数组的最大大小正好为Integer.MAX_VALUE ,因此它不能支持多于那么多的元素
  • A LinkedList doesn't use an array as the underlying storage, so that doesn't limit the size. LinkedList使用数组作为底层存储,因此不会限制大小。 It uses a classical doubly linked list structure with no inherent limit, so its size is only bounded by the available memory.它使用经典的双向链表结构,没有固有限制,因此其大小受可用内存的限制。 Note that a LinkedList will report the size wrongly if it is bigger than Integer.MAX_VALUE , because it uses a int field to store the size and the return type of size() is int as well.请注意,如果LinkedList大于Integer.MAX_VALUE ,则会错误报告大小,因为它使用int字段来存储大小,而size()的返回类型也是int

Note that while the Collection API does define how a Collection with more than Integer.MAX_VALUE elements should behave.请注意,虽然Collection API确实定义了具有多个Integer.MAX_VALUE元素的Collection应该如何表现。 Most importantly it states this the size() documentation :最重要的是它说明size()文档

If this collection contains more than Integer.MAX_VALUE elements, returns Integer.MAX_VALUE .如果此集合包含多个Integer.MAX_VALUE元素,则返回Integer.MAX_VALUE

Note that while HashMap , HashSet and LinkedList seem to support more than Integer.MAX_VALUE elements, none of those implement the size() method in this way (ie they simply let the internal size field overflow).请注意,虽然HashMapHashSetLinkedList似乎支持多个Integer.MAX_VALUE元素,它们都没有以这种方式实现size()方法(即它们只是让内部size字段溢出)。

This leads me to believe that other operations also aren't well-defined in this condition.这让我相信在这种情况下其他操作没有明确定义。

So I'd say it's safe to use those general-purpose collections with up to Integer.MAX_VLAUE elements.所以我认为使用最多包含Integer.MAX_VLAUE元素的通用集合是安全的 If you know that you'll need to store more than that, then you should switch to dedicated collection implementations that actually support this.如果你知道你需要存储更多,那么你应该切换到真正支持这一点的专用集合实现。

In all cases, you're likely to be limited by the JVM heap size rather than anything else.在所有情况下,您很可能会受到 JVM 堆大小而非其他任何因素的限制。 Eventually you'll always get down to arrays so I very much doubt that any of them will manage more than 2 31 - 1 elements, but you're very, very likely to run out of heap before then anyway.最终你总是会深入到数组,所以我非常怀疑它们中的任何一个都会管理超过 2 31 - 1 个元素,但无论如何你非常非常有可能在此之前用完堆。

It very much depends on the implementation details.这在很大程度上取决于实现细节。

A HashSet uses an array as an underlying store which by default it attempt to grow when the collection is 75% full. HashSet 使用一个数组作为底层存储,默认情况下它会在集合达到 75% 时尝试增长。 This means it will fail if you try to add more than about 750,000,000 entries.这意味着如果您尝试添加超过 750,000,000 个条目,它将失败。 (It cannot grow the array from 2^30 to 2^31 entries) (它不能将数组从 2^30 增加到 2^31 个条目)

Increasing the load factor increases the maximum size of the collection.增加负载因子会增加集合的最大大小。 eg a load factor of 10 allows 10 billion elements.例如,10 的负载因子允许 100 亿个元素。 (It is worth noting that HashSet is relatively inefficient past 100 million elements as the distribution of the 32-bit hashcode starts to look less random, and the number of collisions increases) (值得注意的是,HashSet 在超过 1 亿个元素时效率相对较低,因为 32 位哈希码的分布开始看起来不那么随机,并且冲突次数增加)

A Vector doubles its capacity and starts at 10. This means it will fail to grow above approx 1.34 billion. Vector 将其容量加倍并从 10 开始。这意味着它无法增长到大约 13.4 亿以上。 Changing the initial size to 2^n-1 gives you slightly more head room.将初始大小更改为 2^n-1 可为您提供更多的头部空间。

BTW: Use ArrayList rather than Vector if you can.顺便说一句:如果可以,请使用 ArrayList 而不是 Vector。

A LinkedList has no inherent limit and can grow beyond 2.1 billion. LinkedList 没有固有限制,可以增长到 21 亿以上。 At this point size() could return Integer.MAX_VALUE, however some functions such as toArray will fail as it cannot put all objects into an array, in will instead give you the first Integer.MAX_VALUE rather than throw an exception.此时 size() 可以返回 Integer.MAX_VALUE,但是一些函数如 toArray 将失败,因为它不能将所有对象放入一个数组中,而是给你第一个 Integer.MAX_VALUE 而不是抛出异常。

As @Joachim Sauer notes, the current OpenJDK could return an incorrect result for sizes above Integer.MAX_VALUE.正如@Joachim Sauer 所指出的,当前的 OpenJDK 可能会为大于 Integer.MAX_VALUE 的大小返回错误的结果。 eg it could be a negative number.例如,它可能是一个负数。

The maximum size depends on the memory settings of the JVM and of course the available system memory.最大大小取决于 JVM 的内存设置,当然还有可用的系统内存。 Specific size of memory consumption per list entry also differs between platforms, so the easiest way might be to run simple tests.每个列表条目的特定内存消耗大小也因平台而异,因此最简单的方法可能是运行简单的测试。

As stated in other answers, an array cannot reach 2^31 entries.如其他答案所述,数组不能达到 2^31 个条目。 Other data types are limited either by this or they will likely misreport their size() eventually.其他数据类型要么受此限制,要么最终可能会误报它们的 size()。 However, these theoretical limits cannot be reached on some systems:但是,在某些系统上无法达到这些理论限制:

On a 32 bit system, the number of bytes available never exceeds 2^32 exactly.在 32 位系统上,可用字节数永远不会超过 2^32。 And that is assuming that you have no operating system taking up memory.那是假设您没有占用内存的操作系统。 A 32 bit pointer is 4 bytes.一个 32 位的指针是 4 个字节。 Anything which does not rely on arrays must include at least one pointer per entry: this means that the maximum number of entries is 2^32/4 or 2^30 for things that do not utilize arrays.任何不依赖数组的东西都必须在每个条目中至少包含一个指针:这意味着对于不使用数组的事物,条目的最大数量是 2^32/4 或 2^30。

A plain array can achieve it's theoretical limit, but only a byte array, a short array of length 2^31-1 would use up about 2^32+38 bytes.一个普通数组可以达到它的理论极限,但只有一个字节数组,一个长度为 2^31-1 的短数组将使用大约 2^32+38 个字节。

Some java VMs have introduced a new memory model that uses compressed pointers.一些 Java VM 引入了使用压缩指针的新内存模型。 By adjusting pointer alignment, slightly more than 2^32 bytes may be referenced with 32 byte pointers.通过调整指针对齐方式,可以使用 32 字节指针引用略多于 2^32 字节的内容。 Around four times more.大约四倍。 This is enough to cause a LinkedList size() to become negative, but not enough to allow it to wrap around to zero.这足以导致 LinkedList size() 变为负数,但不足以使其环绕为零。

A sixty four bit system has sixty four bit pointers, making all pointers twice as big, making non array lists a bunch fatter.一个 64 位系统有 64 位指针,这使得所有指针都是两倍大,使得非数组列表变得更胖。 This also means that the maximum capacity supported jumps to 2^64 bytes exactly.这也意味着支持的最大容量准确地跳转到 2^64 字节。 This is enough for a 2D array to reach its theoretical maximum.这足以使 2D 阵列达到其理论最大值。 byte[0x7fffffff][0x7fffffff] uses memory apporximately equal to 40+40*(2^31-1)+(2^31-1) (2^31-1)=40+40 (2^31-1)+(2^62-2^32+1) byte[0x7fffffff][0x7fffffff] 使用的内存大约等于 40+40*(2^31-1)+(2^31-1) (2^31-1)=40+40 (2^31-1)+ (2^62-2^32+1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM