简体   繁体   English

为什么HashMap调整大小如果发生碰撞或最坏情况

[英]Why HashMap resize In case of collision or worst case

I am asking this question with respect to java version till 1.7 only. 关于java版本直到1.7我才问这个问题。 I am using reflection to find out current capacity of HashMap. 我正在使用反射来找出HashMap的当前容量。 In below program am putting 12 unique person into a single bucket of HashMap (using same hashcode) . 在下面的程序中我将12个独特的人放入一个HashMap桶中(使用相同的哈希码)。 Then i am putting 13th unique person on same or different bucket(using same or different hashcodes). 然后我将第13个独特的人放在相同或不同的桶上(使用相同或不同的哈希码)。 In both the cases after adding this 13th element, HashMap resizes to 32 buckets. 在添加第13个元素后的两种情况下,HashMap调整为32个桶。 i understand that due to load factor .75 and initial capacity 16 HashMap resizes to it's double with 13th element. 据我所知,由于加载因子.75和初始容量16,HashMap调整为第13个元素的两倍。 But there are still empty buckets available and only 2 buckets are used for these 13th element. 但仍然有空桶可用,只有2个桶用于这些第13个元素。

My questions are: 我的问题是:

1) Is my understanding correct. 1)我的理解是否正确。 Am i not making any mistake. 我没有犯任何错误。 Is this the expected behavior of HashMap . 这是HashMap的预期行为吗?

2) If all this is correct then even though there are 12 or 11 free buckets why the need to double the HashMap with 13th element in this case. 2)如果所有这些都是正确的,那么即使有12或11个空闲桶,为什么在这种情况下需要将HashMap与第13个元素加倍。 Isn't it extra overhead or costly to resize the HashMap. 调整HashMap的大小不是额外开销或成本高昂。 What is the need to double the HashMap in this case While 13th can be put in any avalable bucket according to hashcode. 在这种情况下,需要将HashMap加倍是什么?根据hashcode,第13个可以放入任何可用的桶中。

public class HashMapTest {
    public static void main(String[] args) throws NoSuchFieldException,
            SecurityException, IllegalArgumentException, IllegalAccessException {
        HashMap<Person, String> hm = new HashMap<Person, String>();
        for (int i = 1; i <= 12; i++) {
            // 12 Entry in same bucket(linkedlist)
            hm.put(new Person(), "1");
        }
        System.out.println("Number of Buckets in HashMap : "+bucketCount(hm));
        System.out.println("Number of Entry in HashMap :  " + hm.size());
        System.out.println("**********************************");
        // 13th element in different bucket
        hm.put(new Person(2), "2");
        System.out.println("Number of Buckets in HashMap : "+bucketCount(hm));
        System.out.println("Number of Entry in HashMap :  " + hm.size());
    }
    public static int bucketCount(HashMap<Person, String> h)
            throws NoSuchFieldException, SecurityException,
            IllegalArgumentException, IllegalAccessException {
        Field tableField = HashMap.class.getDeclaredField("table");
        tableField.setAccessible(true);
        Object[] table = (Object[]) tableField.get(h);
        return table == null ? 0 : table.length;
    }
}

class Person {
    int age = 0;
    Person() {
    }
    Person(int a) {
        age = a;
    }
    @Override
    public boolean equals(Object obj) {
        return false;
    }
    @Override
    public int hashCode() {
        if (age != 0) {
            return 1;
        } else {
            return age;
        }
    }
}

OUTPUT OUTPUT

Number of Buckets in HashMap : 16
Number of Entry in HashMap :  12
**********************************
Number of Buckets in HashMap : 32
Number of Entry in HashMap :  13
  1. yes, and this is the expected behavior. 是的,这是预期的行为。
  2. The HashMap doesn't care about how many buckets are used. HashMap不关心使用多少桶。 It only knows that the load factor has been reached, and that the probability of having collisions is thus becoming too big, and the map should thus be resized. 它只知道已经达到了载荷因子,并且碰撞的概率因此变得太大,因此应该调整地图的大小。 Even though many collisions already happened, resizing the map could actually fix that. 尽管已经发生了许多碰撞,但调整地图大小实际上可以解决这个问题。 Not in your case, since you chose identical hashCode on purpose, but in a more realistic case, hashCodes should have a much better distribution. 不是在你的情况下,因为你故意选择相同的hashCode,但在更现实的情况下,hashCodes应该有更好的分布。 HashMap can't do anything to make itself efficient if you choose bad hashCodes on purpose, and there is no point in adding complexity to handle an extreme case, that should never happen, and that HashMap won't be able to fix anyway. 如果您故意选择错误的hashCodes,HashMap无法做任何事情来提高效率,并且没有必要增加处理极端情况的复杂性,这应该永远不会发生,并且HashMap无论如何都无法修复。

Yes, the behavior you observe is the expected behavior. 是的,您观察到的行为是预期的行为。

The implementation of HashMap expects you to use a reasonable hashCode for the keys. HashMap的实现要求您为键使用合理的hashCode It assumes that your hashCode would distribute the keys as evenly as possible among the available buckets. 它假定您的hashCode将尽可能均匀地分配可用桶中的密钥。 If you fail to do that (as you did in your example - where all the keys have the same hashCode ), you will get bad performance. 如果你没有这样做(就像你在你的例子中所做的那样 - 所有的键都具有相同的hashCode ),你将会遇到糟糕的表现。

Under the assumption of even distribution, it makes sense for the HashMap to double its size once you pass the load factor. 在均匀分布的假设下,一旦传递加载因子, HashMap的大小加倍是有意义的。 It doesn't check how many buckets are actually empty (since it has no way of knowing if new entries will be assigned to empty buckets or to occupied buckets). 它不会检查实际上有多少桶是空的(因为它无法知道是否将新条目分配给空桶或占用桶)。 It just checks the average number of entries per bucket. 它只检查每个桶的平均条目数。 Once that number exceeds the load factor, the number of buckets is doubled. 一旦该数量超过负载系数,桶的数量就会翻倍。

There is one slight aspect here also; 这里还有一个轻微的方面; while you resize the internal array (goes from 16 to 32) you are also "touching" all the entries. 当您调整内部数组的大小(从16到32)时,您也“触摸”所有条目。 let me explain: 让我解释:

when there are 16 buckets (internal array is of size 16), only the last 4 bits decide where that entry will go; 当有16个桶(内部数组大小为16)时,只有last 4 bits决定该条目的去向; think % , but internally its actually (n - 1) & hash , where n is the number of buckets. 想想% ,但在内部实际上是(n - 1) & hash ,其中n是桶的数量。

When the internal array grows, one more bit is taken into consideration to decide where an entry goes: there used to be 4 bits , now there are 5 bits ; 当内部数组增长时,需要再考虑一个比特来决定条目的位置:曾经有4 bits ,现在有5 bits ; that means that all the entries are re-hashed and they potentially might move to different buckets now; 这意味着所有条目都被重新散列 ,它们现在可能会移动到不同的桶中; that is why the resizing happens, to disperse entries. 这就是调整大小的原因,分散条目。

If you really want to fill all the "gaps", you specify a load_factor of 1 ; 如果您确实要填充所有“间隙”,请指定load_factor1 ; instead of the default of 0.75 ; 而不是默认值0.75 ; but that has implications as documented in the HashMap constructors. 但这具有HashMap构造函数中记录的含义。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM