简体   繁体   English

Java HashMap 调整大小的时间复杂度

[英]Time complexity for Java HashMap resizing

I am wondering what would be the time complexity on Java HashMap resizing when the load factor exceeds the threshold ?我想知道当负载因子超过阈值时Java HashMap调整大小的时间复杂度是多少? As far as I understand for HashMap the table size is always power of 2 an even number, so whenever we resize the table we don't necessary need to rehash all the keys (correct me if i am wrong), all we need to do is to allocate additional spaces without and copy over all the entries from the old table (I am not quite sure how does JVM deal with that internally), correct ?据我了解 HashMap 表大小总是 2 的幂偶数,所以每当我们调整表的大小时,我们不需要重新散列所有键(如果我错了,请纠正我),我们需要做的是分配额外的空间而不复制旧表中的所有条目(我不太确定JVM如何在内部处理它),对吗? Whereas for Hashtable since it uses a prime number as the table size, so we need to rehash all the entries whenever we re-size the table.而对于Hashtable因为它使用素数作为表大小,所以每当我们重新调整表大小时,我们都需要重新散列所有条目。 So my question is does it still take O(n) linear time for resizing on HashMap ?所以我的问题是在HashMap调整大小是否仍然需要 O(n) 线性时间?

Does it still take O(N) time for resizing a HashMap ?调整HashMap大小是否仍然需要O(N)时间?

Basically, yes.基本上,是的。

And a consequence is that an insertion operation that causes a resize will take O(N) time.结果是导致调整大小的插入操作将花费O(N)时间。 But that happens on O(1/N) of all insertions, so (under certain assumptions) the average insertion time is O(1) .但这发生在所有插入的O(1/N)上,因此(在某些假设下)平均插入时间是O(1)

so could a good load factor affect this performance ?那么一个好的负载因子会影响这个性能吗? like better and faster than O(N) ?喜欢比O(N)更好更快?

Choice of load factor affects performance, but not complexity.负载因子的选择会影响性能,但不会影响复杂性。

If we make normal assumptions about the hash function and key clustering, when the load factor is larger:如果我们对哈希函数和键聚类做正常假设,当负载因子较大时:

  • the average hash chain length is longer, but still O(1) ,平均哈希链长度更长,但仍然是O(1)
  • frequency of resizes reduces, but is still O(1/N) ,调整大小的频率降低,但仍然是O(1/N)
  • the cost of a resize remains about the same, and the complexity is still O(N) .调整大小的成本保持不变,复杂度仍然是O(N)

... so whenever we resize the table we don't necessary need to rehash all the keys (correct me if i am wrong. ...所以每当我们调整表格的大小时,我们都不需要重新哈希所有的键(如果我错了,请纠正我。

Actually, you would need to rehash all of the keys.实际上,你需要老调重弹的所有键。 When you double the hash table size, the hash chains need to be split.当哈希表大小加倍时,需要拆分哈希链。 To do this, you need to test which of two chains the hash value for every key maps to.为此,您需要测试每个键的哈希值映射到两个链中的哪一个。 (Indeed, you need to do the same if the hash table had an open organization too.) (事实上​​,如果哈希表也有一个开放的组织,你也需要这样做。)

However, in the current generation of HashMap implementations, the hashcode values are cached in the chained entry objects, so that the hashcode for a key doesn't ever need to be recomputed.但是,在当前的HashMap实现中,哈希码值缓存在链接的条目对象中,因此不需要重新计算键的哈希码。


One comment mentioned the degenerate case where all keys hash to the same hashcode.一条评论提到了所有键散列到相同散列码的退化情况。 That can happen either due to a poorly designed hash function, or a skewed distribution of keys.发生这种情况的原因可能是散列函数设计不当,也可能是密钥分布不均。

This affects performance of lookup, insertion and other operations, but it does not affect either the cost or frequency of resizes.这会影响查找、插入和其他操作的性能,但不会影响调整大小的成本或频率。

When the table is resized, the entire contents of the original table must be copied to the new table, so it takes O(n) time to resize the table, where n is the number of elements in the original table.调整表格大小时,必须将原始表格的全部内容复制到新表格中,因此调整表格大小需要 O(n) 时间,其中 n 是原始表格中元素的数量。 The amortized cost of any operation on a HashMap (assuming the uniform hashing assumption) is O(1), but yes, the worst case cost of a single insertion operation is O(n). HashMap 上任何操作的摊销成本(假设统一散列假设)是 O(1),但是是的,单个插入操作的最坏情况成本是 O(n)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM