简体   繁体   English

列表中hashCode()的JVM优化

[英]JVM optimisation of hashCode() on List

Imagine a simple case: 想象一个简单的案例:

class B{
    public final String text;
    public B(String text){
        this.text = text;
    }
}

class A {
    private List<B> bs = new ArrayList<B>;

    public B getB(String text){
        for(B b :bs){
           if(b.text.equals(text)){
               return b;
           }
        }
        return null;
    }

    [getter/setter]
}

Imagine that for each instance of A, the List<B> is large and we need to call getB(String) often . 试想一下,对于A的每个实例中, List<B> ,并且我们需要调用getB(String) However assume that it is also possible for the list to change (add/remove element, or even being reassigned). 但是,假定列表也可以更改(添加/删除元素,甚至重新分配)。

At this stage, the average complexity for getB(String) is O(n). 在此阶段,getB(String)的平均复杂度为O(n)。 In order to improved that I was wondering if we could use some clever caching. 为了改善这一点,我想知道我们是否可以使用一些聪明的缓存。

Imagine we cache the List<B> in a Map<String, B> where the key is B.text . 假设我们将List<B>缓存在Map<String, B> ,其中键为B.text That would improve the performance but it won't work if the list is changed (new element or deleted element) or reassigned ( A.bs points to a new reference). 这样可以提高性能,但是如果更改列表(新元素或已删除元素)或重新分配列表( A.bs指向新引用),它将无法正常工作。

To go around that I thought that, along with the Map<String, B> , we could store a hash of the list bs . 为了解决这个问题,我认为,可以与Map<String, B>存储列表bs的哈希。 When we call getB(String) method, we compute the hash of the list bs . 当我们调用getB(String)方法时,我们将计算列表bs的哈希值。 If the hash hasn't changed, we fetch the result from the map, if it has we reload the map. 如果哈希值未更改,则从地图中获取结果,如果哈希值已更改,则重新加载地图。

The problem is that computing the hash for a java.util.List goes through all the element of the list and computes their hash, which is at least O(n). 问题在于,计算java.util.List的哈希将遍历列表的所有元素并计算其哈希,至少为O(n)。

Question

What I'd like to know is whether the JVM will be faster at computing the hash for the List than going through my loop in the getB(String) method. 我想知道的是,JVM是否比通过getB(String)方法中的循环更快地计算List的哈希值。 May be that depends on the implementation of hash for B . 可能取决于B的hash实现。 If so what kind of things could work? 如果是这样,什么样的事情可以工作? In a nutshell, I'd like to know whether this is stupid or could bring some performance improvement. 简而言之,我想知道这是愚蠢的还是可以带来一些性能改进。

Without actually explaining why, you seem for some reason to believe that it is essential to keep the list structure as well. 在没有真正解释原因的情况下,您似乎出于某种原因而认为保持列表结构也是必不可少的。 The only reasonable reason for this is that you need the order of the collection to be kept consistent. 唯一合理的原因是,您需要使集合的顺序保持一致。 If you switch to a "plain" map, the order of the values is no longer constant, eg kept in the order in which you add the items to the map. 如果切换到“普通”地图,则值的顺序不再恒定,例如,保持将项目添加到地图的顺序。

If you need both to keep the order (list behaviour) and access individual items using a key, you can use a LinkedHashMap , which essentially joins the behaviour of a LinkedList and a HashMap . 如果既需要保持顺序(列表行为) 又需要使用键来访问单个项目,则可以使用LinkedHashMap ,它本质LinkedListHashMap的行为结合在一起。 Even if LinkedHashMap.values() returns a collection and not a list, the list behaviour is guaranteed within the collection. 即使LinkedHashMap.values()返回一个集合而不是一个列表,也可以保证列表行为在该集合内。

Another issue with your question is, that you cannot use the list's hash code to safely determine changes. 问题的另一个问题是,您不能使用列表的哈希码来安全地确定更改。 If the hash code has changed, you are indeed sure that the list has changed as well. 如果哈希码已更改,则您确实可以确定列表也已更改。 If two hash codes are identical, you can still not be sure that the lists are actually identical. 如果两个哈希码相同,则仍不能确保列表实际上相同。 Eg if the hash code implementation is based on strings, the hash codes for "1a" and "2B" are identical. 例如,如果哈希码实现是基于字符串的,则“ 1a”和“ 2B”的哈希码是相同的。

If so what kind of things could work? 如果是这样,什么样的事情可以工作?

Simply put: don't let anything else mutate your list without you knowing about it. 简而言之:在您不了解列表的情况下,请勿让其他任何内容对其进行更改。 I suspect you currently have something like: 我怀疑您目前有类似的东西:

public List<String> getAllBs() {
    return bs;
}

... and a similar setter. ...和一个类似的二传手。 If you stop doing that, and instead just have appropriate mutation methods, then you can make sure that your code is the only code to mutate the list... which means you can either remember that your map is "dirty" or just mutate the map at the same time that you mutate the list. 如果您停止这样做,而只是使用适当的突变方法,则可以确保您的代码是唯一使列表发生突变的代码...这意味着您可以记住您的地图是“脏”的,或者只是对地图进行了突变更改列表的同时映射。

You could implement your own class IndexedBArrayList which extends ArrayList<B> . 你可以实现你自己的类IndexedBArrayList延伸ArrayList<B>

Then you add this functionality to it: 然后向其中添加此功能:

  • A private HashMap<String, B> index private HashMap<String, B> index
  • All mutator methods of ArrayList are overridden to keep this index hash map updated in addition to calling the corresponding super-method. 除了调用相应的超级方法之外,还将重写ArrayList的所有mutator方法,以保持此索引哈希映射的更新。
  • A new public B getByString(String) method which uses the hash map 一个新的public B getByString(String)方法,该方法使用哈希映射

From your description it does not seem that you need a List<B> . 从您的描述看来,您似乎不需要List<B>
Replace the List with a HashMap . List替换为HashMap If you need to search for B s the best data structure is the hashmap and not the list. 如果您需要搜索B则最好的数据结构是哈希图,而不是列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM