简体   繁体   English

如何有效地为Java中的单链表节点实现hashCode()?

[英]How to efficiently implement hashCode() for a singly linked list node in Java?

Eclipse implements the hashCode() function for a singly linked list's Node class the following way: Eclipse通过以下方式为单链表的Node类实现hashCode()函数:

class Node{
    int val;
    Node next;

    public Node(int val){
        this.val = val;
        next = null;
    }
    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((next == null) ? 0 : next.hashCode());
        result = prime * result + val;
        return result;
    }
}

Now hashCode() for a node is dependent on the hash code of the nodes that follow it. 现在,节点的hashCode()依赖于它后面的节点的哈希码。

So, every call of hashCode() will take amortized linear time in the length of the linked list. 因此,每次调用hashCode()都会在链表长度中占用分摊的线性时间。 Thus using a HashSet<Node> will become unfeasible. 因此,使用HashSet<Node>将变得不可行。

One way to get around this is to cache the value of the hashCode in a variable(call it hash) so that it is computed only once. 解决这个问题的一种方法是将hashCode的值缓存在一个变量中(称之为hash),这样它只能被计算一次。 But even in this case, the hash will become invalid once any node's val is changed. 但即使在这种情况下,一旦任何节点的val改变,哈希将变为无效。 And again it will take linear time to modify the hashCode of nodes that follow the current node. 同样,需要线性时间来修改当前节点之后的节点的hashCode

So what are some good ways of implementing hashing for such a linked list Node? 那么为这样的链表Node实现散列的一些好方法是什么?

My first thought upon reading your question was: what does LinkedList do? 我在阅读你的问题时首先想到的是: LinkedList做了什么? Digging into the source, we see that there is no hashCode() or equals() defined on the inner LinkedList.Node class ( link to source ). 深入研究源代码,我们看到内部LinkedList.Node类( 链接到源 )上没有定义hashCode()equals() )。

Why does this make sense? 为什么这有意义? Well, nodes are normally internal data structures, only visible to the list itself. 嗯,节点通常是内部数据结构,只对列表本身可见。 They are not going to be placed into collections or any other data structure where comparing equality and hash-codes are necessary. 它们不会被放置到集合或任何其他需要比较相等和哈希码的数据结构中。 No external code has access to them. 没有外部代码可以访问它们。

You say in your question: 你在问题中说:

Thus using a HashSet<Node> will become unfeasible. 因此,使用HashSet<Node>将变得不可行。

But I would argue that you have no need to place your nodes in such a data structure. 但我认为您不需要将节点放在这样的数据结构中。 By definition, your nodes will link to each other and require no additional classes to facilitate that relationship. 根据定义,您的节点将相互链接,并且不需要其他类来促进该关系。 And unless you plan to expose this class outside your list (which isn't necessary), they will never end up in a HashSet . 除非你打算在列表之外公开这个类(这是不必要的),否则它们永远不会以HashSet结尾。

I would propose you follow the LinkedList.Node model and avoid creating these methods on your nodes. 我建议你遵循LinkedList.Node模型,避免在你的节点上创建这些方法。 The outer list can base its hashcode and equality on the values stored in the nodes (but not the nodes themselves), which is how LinkedList does it - see AbstractList ( link to source ). 外部列表可以基于存储在节点中的值(但不是节点本身)的哈希码和相等性,这是LinkedList工作方式 - 请参阅AbstractList链接到源 )。

Source links are to the OpenJDK source, but in this case they are identical to source supplied with Oracle JDKs 源链接指向OpenJDK源,但在这种情况下,它们与Oracle JDK提供的源相同

You have to ask yourself what quality of hashing is valueable for you. 你必须问自己什么样的哈希值对你来说是有价值的。 The only restriction is to make sure another list with same number in same order has the same hash. 唯一的限制是确保具有相同顺序的相同编号的另一个列表具有相同的散列。 That's achieved by using a contant number as well as using the first as well as by limiting on 5 numbers. 这是通过使用一个数字以及使用第一个以及限制5个数字来实现的。 How much numbers make sense for you depends on the structure of your data. 多少数字对您有意义取决于数据的结构。 If for example you always store consecutive, ascending numbers starting from 1 and the difference is only the length, that will be hard to optimize. 例如,如果您始终存储从1开始的连续,升序数字,并且差异仅是长度,则难以优化。 If it's completly random over the entire range of int the first number will do the job well. 如果它在整个int范围内完全随机,则第一个数字将很好地完成工作。 How many numbers deliver the best ratio for you is found out by measuring I'd say. 通过衡量,我会说,有多少数字可以为您提供最佳比例。

In the end what you need is a good ration between collisions (objects put to the same bucket) and calculation time. 最后你需要的是碰撞(放在同一个桶中的物体)和计算时间之间的良好比例。 Generated implementation typically try to maximize the calculation time, providing the human developer with the pleasure of much room for improvement. 生成的实现通常试图最大化计算时间,为人类开发人员提供了很大的改进空间。 ;-) ;-)

And concerning the changing of contained value: java.util.HashSet (respectivly the HashMap it holds) will calulate its own hash upon yours, and cache that. 关于包含值的更改:java.util.HashSet(分别是它所拥有的HashMap)将在你自己的哈希值上计算,并缓存它。 So if an object containted in a HashSet can't be found again once it changed that far that its hash changed. 因此,如果在HashSet中包含的对象一旦改变到其哈希值发生变化就无法再次找到。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM