简体   繁体   English

Java的“字符串”哈希码函数是否是线程安全的,如果它的缓存设置器不使用锁?

[英]Is Java's “String” hashcode function thread-safe if its cache setter does not use locks?

Here is the code from Java's String hashCode function 这是来自Java的String hashCode函数的代码

    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

As you can see, it checks if the hash ("private int hash") == 0 and otherwise sets it. 如您所见,它检查散列(“private int hash”)== 0并以其他方式设置它。 The constructor does not always set this value (and why else the check of course). 构造函数并不总是设置此值(以及为什么还要检查当然)。

So although it would be quite hard to reproduce in practical usages, it looks like one could have a race condition on this hash right? 因此,虽然在实际应用中很难再现,但看起来这个哈希上的竞争条件似乎正确吗?

I mean, once you put it in a hashmap, for example, it would be safe, unless you first sent it off to another thread. 我的意思是,一旦你把它放在一个hashmap中,它就会安全,除非你先把它发送到另一个线程。 But if the string was on two threads and simultaneously added to a hashmap, the hashMap function could take the partially written "hash" value and return it. 但是如果字符串在两个线程上并同时添加到散列映射中,则hashMap函数可以采用部分写入的“散列”值并返回它。

Theoretically one can generate code that would cause multiple simultaneous threads to read the 0 valued hash and go into the calculation part. 从理论上讲,可以生成一些代码,这些代码会导致多个同时线程读取0值哈希并进入计算部分。 That would be "wasteful", but safe, since the function operates on the immutable characters, and each instance would calculate the exact same hash value. 这将是“浪费”,但是安全,因为函数对不可变字符进行操作,并且每个实例将计算完全相同的散列值。

To summarize the results from the comments section - reads and writes of integer are done atomicly in the Java VM. 总结注释部分的结果 - 整数的读取和写入在Java VM中以原子方式完成。

The assosiated spec can be found under "Atomic Access" on the Oracle website. 可以在Oracle网站上的“Atomic Access”下找到相关规范。

https://docs.oracle.com/javase/tutorial/essential/concurrency/atomic.html https://docs.oracle.com/javase/tutorial/essential/concurrency/atomic.html

I don't work with Java, but that does not seem to be a "race condition". 我不使用Java,但这似乎不是一个“竞争条件”。 I would say that it is a "lazy compute the hash, and after that, cache the value". 我会说它是“懒惰计算哈希,然后缓存值”。

So, no use to compute the hash in constructor, if no one will call the hashCode method. 因此,如果没有人会调用hashCode方法,那么在构造函数中计算哈希是没有用的。 But after the first one calls hashCode, then the value is computed, will never change, so you can cache it for further calls. 但是在第一个调用hashCode之后,那么该值将被计算,永远不会改变,因此您可以将其缓存以供进一步调用。

LATER EDIT: 后期编辑:

Now I see your point. 现在我明白你的意思了。 That is the purpose of the local variable. 这是局部变量的目的。 And not using a "thread safe" mechanism is a "performance decision", as creating a lock, using it, releasing it comes with a cost, while the case you are talking about (2 threads calling hashCode at the same time) is a pretty difficult to reach in real life scenario, and the result will be the same hash value. 而不使用“线程安全”机制是一个“性能决定”,因为创建一个锁,使用它,释放它需要一个成本,而你正在谈论的情况(2个线程同时调用hashCode)是一个在现实生活场景中很难达到,结果将是相同的哈希值。

Reading and writing to hashCode is not properly synchronized according to the Java Memory Model but it is safe nevertheless. 根据Java内存模型,对hashCode读取和写入未正确同步,但它仍然是安全的。

If multiple threads write to hashCode then, due to the immutability of a String object, it is implicit that the calculation yields the same result. 如果多个线程写入hashCode那么由于String对象的不变性,隐含的是计算产生相同的结果。 Assume that this result is x then any thread is guaranteed to observe either 0 or x because int is atomic on all VMs. 假设此结果为x则保证任何线程都观察到0x因为int在所有VM上都是原子的。 In case that a thread observes 0 , it simply recalculates the hash code which is guaranteed to yield x , thus only resetting the value if another thread applied the operation concurrently or within its thread-local cache. 如果一个线程观察到0 ,它只是重新计算保证产生x的哈希码,因此只有在另一个线程同时或在其线程本地缓存中应用该操作时才重置该值。

In this sense, the outcome is deterministic. 从这个意义上讲,结果是确定性的。 At the same time, it is not required to synchronized threads for sharing this instance. 同时,不需要同步线程来共享此实例。 Assume that you would have some key "foo" throughout your application used by all of your threads. 假设在所有线程使用的应用程序中都有一些关键的"foo" Due to Java's string deduplication, this string constant would be shared among all of your threads which would have to synchronize only to save them the trouble to recompute the hash codes. 由于Java的字符串重复数据删除,这个字符串常量将在所有线程之间共享,这些线程必须同步才能省去重新计算哈希码的麻烦。 Computing the hash code is however a very cheap operation whereas synchronization is very expensive. 然而,计算哈希码是非常便宜的操作,而同步非常昂贵。 As the correctness is given, this optimization makes sense. 由于给出了正确性,这种优化是有道理的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM