简体   繁体   English

有关java.util.Hashtable的实现细节的查询

[英]Queries regarding the implementation details of java.util.Hashtable

I have the following queries with regard to how the java.util.Hashtable is implemented. 关于如何实现java.util.Hashtable,我有以下查询。 These are low level queries and not related to the usage of Hashtable but only with how the designers have chosen to implement the data structure 这些是低级查询,与Hashtable的使用无关,只与设计人员如何选择实现数据结构有关

  • The Hashtable is created with a default size of 11 buckets. Hashtable的默认大小为11个桶。 What is special about 11? 11有什么特别之处? Why not 10? 为什么不10? Although I am inclined to think that this is just a magic number, I think not as well 虽然我倾向于认为这只是一个神奇的数字,但我认为不太好
  • To compute the bucket number, why do we not directly use the hashcode of the passed in key object. 要计算存储桶编号,为什么我们不直接使用传入的密钥对象的哈希码。 In the implementation we actually compute the bucket number as (hashcode & 7FFFFFFF) % table size where hashcode is the returned value for the input key and table size is 11 by default. 在实现中,我们实际上将桶号计算为(hashcode&7FFFFFFF)%table size,其中hashcode是输入键的返回值,默认情况下表大小为11。 Why are we rehashing the hashcode itself? 为什么我们重新编写哈希码本身? Couldn't it have been just hashcode % table size ? 难道它只是哈希码%表大小?
  • The contains(Object value) method searches for the presence of the value in the hashtable. contains(Object value)方法搜索哈希表中是否存在值。 For this we sequentially search from the last bucket and move towards the first bucket. 为此,我们从最后一个桶顺序搜索并向第一个桶移动。 Is this just a developer style adopted? 这只是采用的开发者风格吗? The hashtable is just an array of linked lists. 哈希表只是一个链表的数组。 More intuitively I expected the search to move from the first bucket onto the last bucket, but found it otherwise. 更直观地说,我希望搜索从第一个桶移动到最后一个桶,但不然发现了。 I understand that functionally both are the same. 我知道功能上两者都是一样的。 But any other reason? 但还有其他原因吗?
  • The maximum array size (used during the rehash) is set to Integer.MAX - 8. What's the significance of 8 here? 最大数组大小(在重新散列期间使用)设置为Integer.MAX - 8. 8这里有什么意义?
  1. It appears to be an empirical value involving a tradeoff between too much space used and time-wasting rehashing operations. 它似乎是一个经验值,涉及在使用太多空间和浪费时间的重复操作之间进行权衡。 Hashtable javadocs : Hashtable javadocs

The initial capacity controls a tradeoff between wasted space and the need for rehash operations, which are time-consuming. 初始容量控制了浪费空间和重新运算操作的需要之间的权衡,这是非常耗时的。 No rehash operations will ever occur if the initial capacity is greater than the maximum number of entries the Hashtable will contain divided by its load factor. 如果初始容量大于Hashtable将包含的最大条目数除以其加载因子,则不会发生重复操作。 However, setting the initial capacity too high can waste space. 但是,将初始容量设置得太高会浪费空间。

  1. The value is is bitmasked with 0x7FFFFFFF to remove the first bit that would make the value negative. 该值使用0x7FFFFFFF进行位掩码,以删除将使值为负的第一个位。 This forces the value to be non-negative, so that the resulting index after the % operation will also be non-negative. 这会强制该值为非负值,因此%操作后的结果索引也将为非负值。 This is necessary to produce a viable index into the internal bucket array. 这对于在内部桶阵列中产生可行索引是必要的。

  2. It's possible that this was done to increase performance slightly. 这可能是为了略微提高性能。 This article claims that looping backwards does exactly that. 本文声称向后循环确实如此。

The result show there's not much different between forward and reverse looping in 1 million of data. 结果表明,在100万个数据中,正向和反向循环之间没有太大差异。 However when data grow huge, the performance of reverse looping is slightly faster than forward looping around 15%. 然而,当数据变得庞大时,反向循环的性能比前向循环的性能略快15%左右。

I don't know if that's really true, but that may have been the motivation. 我不知道这是否真的如此,但这可能是动机。

  1. The source code I have reveals Javadocs on the private constant used for the maximum array size. 我的源代码揭示了用于最大数组大小的私有常量上的Javadocs。
/**
 * The maximum size of array to allocate.
 * Some VMs reserve some header words in an array.
 * Attempts to allocate larger arrays may result in
 * OutOfMemoryError: Requested array size exceeds VM limit
 */
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

I don't know how valid this is now, but this was an attempt to avoid unexpected OutOfMemoryError s. 我不知道现在有多有效,但这是为了避免意外的OutOfMemoryError

#1: AndreyS answered in the comments that: Why initialCapacity of Hashtable is 11 while the DEFAULT_INITIAL_CAPACITY in HashMap is 16 and requires a power of 2 #1:AndreyS在评论中回答: 为什么Hashtable的initialCapacity为11而HashMap中的DEFAULT_INITIAL_CAPACITY为16且需要2的幂

#2: to make sure the number is positive before computing the modulus. #2:在计算模数之前确保数字为正数。 Otherwise the outcome may be negative and we'll be out of bounds. 否则结果可能是负面的,我们将超出界限。

#3: I have to guess that when reverse looping you only evaluate length once, and compare to a constant (0), and in regular loops you compare to a variable. #3:我必须猜测,当反向循环时,你只评估一次长度,并与常量(0)进行比较,并在常规循环中与变量进行比较。 I don't know if that's what they had in their mind but it can be a consideration. 我不知道这是否是他们心中的想法,但这可能是一个考虑因素。

#4: To avoid integer overflow in rehash(): #4:为避免rehash()中的整数溢出:

int i = table.length;
Entry[] arrayOfEntry1 = table;
int j = (i << 1) + 1;
if (j - 2147483639 > 0)
{
  if (i == 2147483639) {
    return;
  }
  j = 2147483639;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM