简体   繁体   English

java.util.HashMap和HashSet的内部实现

[英]Internal implementation of java.util.HashMap and HashSet

I have been trying to understand the internal implementation of java.util.HashMap and java.util.HashSet . 我一直在尝试理解java.util.HashMapjava.util.HashSet的内部实现。

Following are the doubts popping in my mind for a while: 以下是我脑海中浮现的疑惑:

  1. Whats is the importance of the @Override public int hashcode() in a HashMap/HashSet? 什么是HashMap / HashSet中@Override public int hashcode()的重要性? Where is this hash code used internally? 这个哈希码在内部使用在哪里?
  2. I have generally seen the key of the HashMap be a String like myMap<String,Object> . 我已普遍观察到的HashMap中的密钥是一个StringmyMap<String,Object> Can I map the values against someObject (instead of String) like myMap<someObject, Object> ? 我可以将值映射到someObject (而不是String),如myMap<someObject, Object>吗? What all contracts do I need to obey for this happen successfully? 我需要遵守的所有合同成功发生了什么?

Thanks in advance ! 提前致谢 !

EDIT: 编辑:

  1. Are we saying that the hash code of the key (check!) is the actual thing against which the value is mapped in the hash table? 我们是说密钥的哈希码(check!)是在哈希表中映射值的实际内容吗? And when we do myMap.get(someKey); 当我们做myMap.get(someKey); java is internally calling someKey.hashCode() to get the number in the Hash table to be looked for the resulting value? java在内部调用someKey.hashCode()来获取Hash表中要查找结果值的数字?

Answer: Yes. 答:是的。

EDIT 2: 编辑2:

  1. In a java.util.HashSet , from where is the key generated for the Hash table? java.util.HashSet ,从哪里为Hash表生成密钥? Is it from the object that we are adding eg. 它来自我们正在添加的对象,例如。 mySet.add(myObject); then myObject.hashCode() is going to decide where this is placed in the hash table? 然后myObject.hashCode()将决定它在哈希表中的位置? (as we don't give keys in a HashSet). (因为我们不在HashSet中给出键)。

Answer: The object added becomes the key. 答:添加的对象成为关键。 The value is dummy! 价值是假的!

The answer to question 2 is easy - yes you can use any Object you like. 问题2的答案很简单 - 是的,你可以使用任何你喜欢的对象。 Maps that have String type keys are widely used because they are typical data structures for naming services. 具有String类型键的映射被广泛使用,因为它们是命名服务的典型数据结构。 But in general, you can map any two types like Map<Car,Vendor> or Map<Student,Course> . 但一般来说,您可以映射任何两种类型,如Map<Car,Vendor>Map<Student,Course>

For the hashcode() method it's like answered before - whenever you override equals(), then you have to override hashcode() to obey the contract. 对于hashcode()方法,它就像之前一样回答 - 每当你重写equals()时,你必须覆盖hashcode()来服从契约。 On the other hand, if you're happy with the standard implementation of equals(), then you shouldn't touch hashcode() (because that could break the contract and result in identical hashcodes for unequal objects). 另一方面,如果您对equals()的标准实现感到满意,那么您不应该触及hashcode()(因为这可能会破坏契约并导致不等对象的相同哈希码)。

Practical sidenote: eclipse (and probably other IDEs as well) can auto generate a pair of equals() and hashcode() implementation for your class, just based on the class members. 实用的旁注:eclipse(以及可能还有其他IDE)可以为您的类自动生成一对equals()和hashcode()实现,只基于类成员。

Edit 编辑

For your additional question: yes, exactly. 对于您的其他问题:是的,确切地说。 Look at the source code for HashMap.get(Object key); 查看HashMap.get(Object key)的源代码; it calls key.hashcode to calculate the position (bin) in the internal hashtable and returns the value at that position (if there is one). 它调用key.hashcode来计算内部哈希表中的位置(bin)并返回该位置的值(如果有的话)。

But be careful with 'handmade' hashcode/equals methods - if you use an object as a key, make sure that the hashcode doesn't change afterwards, otherwise you won't find the mapped values anymore. 但要注意“手工制作”的hashcode / equals方法 - 如果使用对象作为键,请确保哈希码之后不会更改,否则您将无法再找到映射的值。 In other words, the fields you use to calculate equals and hashcode should be final (or 'unchangeable' after creation of the object). 换句话说,用于计算equals和hashcode的字段应该是final (或者在创建对象后“ 不可更改”)。

Assume, we have a contact with String name and String phonenumber and we use both fields to calculate equals() and hashcode(). 假设我们有一个String nameString phonenumber的联系人,我们使用这两个字段来计算equals()和hashcode()。 Now we create "John Doe" with his mobile phone number and map him to his favorite Donut shop. 现在我们用他的手机号码创建“John Doe”并将他映射到他最喜欢的甜甜圈店。 hashcode() is used to calculate the index (bin) in the hash table and that's where the donut shop is stored. hashcode()用于计算哈希表中的索引(bin)以及存储甜甜圈店的位置。

Now we learn that he has a new phone number and we change the phone number field of the John Doe object. 现在我们了解到他有一个新的电话号码,我们更改了John Doe对象的电话号码字段。 This results in a new hashcode. 这导致新的哈希码。 And this hashcode resolves to a new hash table index - which usually isn't the position where John Does' favorite Donut shop was stored. 并且这个哈希码解析为一个新的哈希表索引 - 通常不是John Do'最喜欢的甜甜圈店存储的位置。

The problem is clear: In this case we wanted to map "John Doe" to the Donut shop, and not "John Doe with a specific phone number". 问题很明显:在这种情况下,我们想要将“John Doe”映射到Donut商店,而不是“带有特定电话号码的John Doe”。 So, we have to be careful with autogenerated equals/hashcode to make sure they're what we really want, because they might use unwanted fields, introducing trouble with HashMaps and HashSets. 因此,我们必须小心使用自动生成的equals / hashcode来确保它们是我们真正想要的,因为它们可能会使用不需要的字段,从而给HashMaps和HashSets带来麻烦。

Edit 2 编辑2

If you add an object to a HashSet, the Object is the key for the internal hash table, the value is set but unused (just a static instance of Object). 如果将对象添加到HashSet,则Object是内部哈希表的键,该值已设置但未使用(只是Object的静态实例)。 Here's the implementation from the openjdk 6 (b17): 这是openjdk 6(b17)的实现:

// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
private transient HashMap<E,Object> map;

public boolean add(E e) {
  return map.put(e, PRESENT)==null;
}

Whats is the importance of the @Override public int hashcode() in a HashMap/HashSet? 什么是HashMap / HashSet中@Override public int hashcode()的重要性?

This allows the instance of the map to produce a useful hash code depending on the content of the map. 这允许地图实例根据地图的内容生成有用的哈希码。 Two maps with the same content will produce the same hash code. 具有相同内容的两个映射将生成相同的哈希码。 If the content is different, the hash code will be different. 如果内容不同,则哈希码将不同。

Where is this hash code used internally? 这个哈希码在内部使用在哪里?

Never. 决不。 This code only exists so you can use a map as a key in another map. 此代码仅存在,因此您可以将地图用作另一个地图中的关键字。

Can I map the values against someObject (instead of String ) like myMap<someObject, Object> ? 我可以将值映射到someObject (而不是String ),如myMap<someObject, Object>吗?

Yes but someObject must be a class, not an object (your name suggests that you want to pass in object; it should be SomeObject to make it clear you're referring to the type). 是的,但someObject必须是一个类,而不是一个对象(你的名字暗示你要传入对象;它应该是SomeObject ,以明确你指的是类型)。

What all contracts do I need to obey for this happen successfully? 我需要遵守的所有合同成功发生了什么?

The class must implement hashCode() and equals() . 该类必须实现hashCode()equals()

[EDIT] [编辑]

Are we saying that the hash code of the key (check!) is the actual thing against which the value is mapped in the hash table? 我们是说密钥的哈希码(check!)是在哈希表中映射值的实际内容吗?

Yes. 是。

Yes. 是。 You can use any object as the key in a HashMap. 您可以使用任何对象作为HashMap中的键。 In order to do so following are the steps you have to follow. 为了做到这一点,您必须遵循以下步骤。

  1. Override equals. 覆盖等于。

  2. Override hashCode. 覆盖hashCode。

The contracts for both the methods are very clearly mentioned in documentation of java.lang.Object. 在java.lang.Object的文档中非常清楚地提到了这两种方法的合同。 http://java.sun.com/javase/6/docs/api/java/lang/Object.html http://java.sun.com/javase/6/docs/api/java/lang/Object.html

And yes hashCode() method is used internally by HashMap and hence returning proper value is important for performance. 是的hashCode()方法由HashMap内部使用,因此返回适当的值对性能很重要。

Here is the hashCode() method from HashMap 这是HashMap的hashCode()方法

public V put(K key, V value) {
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key.hashCode());
    int i = indexFor(hash, table.length);
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    addEntry(hash, key, value, i);
    return null;
}

It is clear from the above code that hashCode of each key is not just used for hashCode() of the map, but also for finding the bucket to place the key,value pair. 从上面的代码中可以清楚地看出,每个键的hashCode不仅用于地图的hashCode(),而且还用于查找存储键以放置键,值对。 That is why hashCode() is related to performance of the HashMap 这就是为什么hashCode()与HashMap的性能有关

Hashing containers like HashMap and HashSet provide fast access to elements stored in them by splitting their contents into "buckets". 散列容器(如HashMapHashSet可以通过将其内容拆分为“存储桶”来快速访问存储在其中的元素。

For example the list of numbers: 1, 2, 3, 4, 5, 6, 7, 8 stored in a List would look (conceptually) in memory something like: [1, 2, 3, 4, 5, 6, 7, 8] . 例如1, 2, 3, 4, 5, 6, 7, 8存储在List中的数字列表: 1, 2, 3, 4, 5, 6, 7, 8将在内存中(概念上)看起来像: [1, 2, 3, 4, 5, 6, 7, 8]

Storing the same set of numbers in a Set would look more like this: [1, 2] [3, 4] [5, 6] [7, 8] . Set存储相同的数字Set将看起来更像这样: [1, 2] [3, 4] [5, 6] [7, 8] In this example the list has been split into 4 buckets. 在此示例中,列表已拆分为4个存储桶。

Now imagine you want to find the value 6 out of both the List and the Set . 现在假设你想要找到ListSet的值6 With a list you would have to start at the beginning of the list and check each value until you get to 6, this will take 6 steps. 使用列表,您必须从列表的开头开始并检查每个值,直到达到6,这将需要6个步骤。 With a set you find the correct bucket, the check each of the items in that bucket (only 2 in our example) making this a 3 step process. 使用set,您可以找到正确的存储桶,检查该存储桶中的每个项目(在我们的示例中仅为2个),使其成为一个3步骤的过程。 The value of this approach increases dramatically the more data you have. 这种方法的价值会随着您拥有的数据量的增加而显着增加。

But wait how did we know which bucket to look in? 但是等一下我们怎么知道要看哪个桶? That is where the hashCode method comes in. To determine the bucket in which to look for an item Java hashing containers call hashCode then apply some function to the result. 这就是hashCode方法的用武之地。要确定查找项的存储桶,Java哈希容器调用hashCode然后将一些函数应用于结果。 This function tries to balance the numbers of buckets and the number of items for the fastest lookup possible. 此函数尝试平衡存储桶的数量和项目数,以便尽可能快地查找。

During lookup once the correct bucket has been found each item in that bucket is compared one at a time as in a list. 在查找过程中,一旦找到正确的存储桶,就会在列表中逐个比较该存储桶中的每个项目。 That is why when you override hashCode you must also override equals . 这就是为什么当你覆盖hashCode时你也必须覆盖equals So if an object of any type has both an equals and a hashCode method it can be used as a key in a Map or an entry in a Set . 因此,如果任何类型的对象同时具有equalshashCode方法,则它可以用作Map的键或Set的条目。 There is a contract that must be followed to implement these methods correctly the canonical text on this is from Josh Bloch's great book Effective Java: Item 8: Always override hashCode when you override equals 有一个必须遵循的合同才能正确实现这些方法。规范性文本来自Josh Bloch的伟大着作Effective Java: 第8项:当你重写equals时总是覆盖hashCode

  1. Any Object in Java must have a hashCode() method; Java中的任何Object都必须具有hashCode()方法; HashMap and HashSet are no execeptions. HashMapHashSet不是例外。 This hash code is used if you insert the hash map/set into another hash map/set. 如果将哈希映射/集插入另一个哈希映射/集中,则使用此哈希码。
  2. Any class type can be used as the key in a HashMap / HashSet . 任何类类型都可以用作HashMap / HashSet的键。 This requires that the hashCode() method returns equal values for equal objects, and that the equals() method is implemented according to contract (reflexive, transitive, symmetric). 这要求hashCode()方法为相等的对象返回相等的值,并且equals()方法是根据契约(自反,传递,对称)实现的。 The default implementations from Object already obey these contracts, but you may want to override them if you want value equality instead of reference equality. Object的默认实现已经遵循这些契约,但如果您希望值相等而不是引用相等,则可能希望覆盖它们。

There is a intricate relationship between equals(), hashcode() and hash tables in general in Java (and .NET too, for that matter). 在Java(以及.NET)中,equals(), hashcode()和哈希表之间存在复杂的关系。 To quote from the documentation: 引用文档:

public int hashCode()

Returns a hash code value for the object. 返回对象的哈希码值。 This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable . 支持此方法是为了哈希表的好处,例如java.util.Hashtable提供的哈希表。

The general contract of hashCode is: hashCode的一般契约是:

  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. 每当在执行Java应用程序期间多次在同一对象上调用它时,hashCode方法必须始终返回相同的整数,前提是不修改对象的equals比较中使用的信息。 This integer need not remain consistent from one execution of an application to another execution of the same application. 从应用程序的一次执行到同一应用程序的另一次执行,该整数不需要保持一致。
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result. 如果两个对象根据equals(Object)方法相等,则对两个对象中的每一个调用hashCode方法必须生成相同的整数结果。
  • It is not required that if two objects are unequal according to the equals( java.lang.Object ) method, then calling the hashCode method on each of the two objects must produce distinct integer results. 如果两个对象根据equals( java.lang.Object )方法不相等,则不需要在两个对象中的每一个上调用hashCode方法必须生成不同的整数结果。 However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables. 但是,程序员应该知道为不等对象生成不同的整数结果可能会提高哈希表的性能。

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. 尽可能合理, Object类定义的hashCode方法确实为不同的对象返回不同的整数。 (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.) (这通常通过将对象的内部地址转换为整数来实现,但Java™编程语言不需要此实现技术。)

The line 这条线

@Overrides public int hashCode()

just tells that the hashCode() method is overridden. 只是告诉我们覆盖了hashCode()方法。 This ia usually a sign that it's safe to use the type as key in a HashMap . 通常表示在HashMap使用类型作为键是安全的。

And yes, you can aesily use any object which obeys the contract for equals() and hashCode() in a HashMap as key. 是的,你可以aesily使用的服从为合同中的任何对象equals()hashCode()的一个HashMap的关键。

Aaron Digulla is absolutely correct. Aaron Digulla是完全正确的。 An interesting additional note that people don't seem to realise is that the key object's hashCode() method is not used verbatim. 人们似乎没有意识到的一个有趣的附加注释是密钥对象的hashCode()方法不是逐字使用的。 It is, in fact, rehashed by the HashMap ie it calls hash(someKey.hashCode)) , where hash() is an internal hashing method. 事实上,它是由HashMap重新调用的,即它调用hash(someKey.hashCode)) ,其中hash()是一个内部散列方法。

To see this, have a look at the source: http://kickjava.com/src/java/util/HashMap.java.htm 要查看此内容,请查看源代码: http//kickjava.com/src/java/util/HashMap.java.htm

The reason for this is that some people implement hashCode() poorly and the hash() function gives a better hash distribution. 原因是有些人很难实现hashCode(),而hash()函数提供了更好的散列分布。 It's basically done for performance reasons. 它基本上是出于性能原因而完成的。

In answer to question 2, though you can have any class that can be used to as the key in Hashmap, the best practice is to use immutable classes as keys for the HashMap. 在回答问题2时,虽然您可以使用任何可用作Hashmap中键的类,但最佳做法是使用不可变类作为HashMap的键。 Or at the least if your "hashCode", and "equals" implementation are dependent on some of the attributes of your class then you should take care that you don't provide methods to alter these attributes. 或者至少如果你的“hashCode”和“equals”实现依赖于你的类的某些属性,那么你应该注意不要提供改变这些属性的方法。

HashCode method for collection classes like HashSet, HashTable, HashMap etc – Hash code returns integer number for the object that is being supported for the purpose of hashing. 集合类的HashCode方法,如HashSet,HashTable,HashMap等 - 散列代码返回为散列目的而支持的对象的整数。 It is implemented by converting internal address of the object into an integer. 它是通过将对象的内部地址转换为整数来实现的。 Hash code method should be overridden in every class that overrides equals method. 应该在覆盖equals方法的每个类中重写散列码方法。 Three general contact for HashCode method HashCode方法的三个常规联系人

  • For two equal objects acc. 对于两个相同的对象acc。 to equal method, then calling HashCode for both object it should produce same integer value. 为了等于方法,然后为两个对象调用HashCode它应该产生相同的整数值。

  • If it is being called several times for a single object, then it should return constant integer value. 如果为单个对象多次调用它,则它应返回常量整数值。

  • For two unequal objects acc. 对于两个不相等的对象acc。 to equal method, then calling HashCode method for both object, it is not mandatory that it should produce distinct value. 为了等于方法,然后为两个对象调用HashCode方法,它不应该强制它产生不同的值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM