简体   繁体   中英

Is Java's “String” hashcode function thread-safe if its cache setter does not use locks?

Here is the code from Java's String hashCode function

    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

As you can see, it checks if the hash ("private int hash") == 0 and otherwise sets it. The constructor does not always set this value (and why else the check of course).

So although it would be quite hard to reproduce in practical usages, it looks like one could have a race condition on this hash right?

I mean, once you put it in a hashmap, for example, it would be safe, unless you first sent it off to another thread. But if the string was on two threads and simultaneously added to a hashmap, the hashMap function could take the partially written "hash" value and return it.

Theoretically one can generate code that would cause multiple simultaneous threads to read the 0 valued hash and go into the calculation part. That would be "wasteful", but safe, since the function operates on the immutable characters, and each instance would calculate the exact same hash value.

To summarize the results from the comments section - reads and writes of integer are done atomicly in the Java VM.

The assosiated spec can be found under "Atomic Access" on the Oracle website.

https://docs.oracle.com/javase/tutorial/essential/concurrency/atomic.html

I don't work with Java, but that does not seem to be a "race condition". I would say that it is a "lazy compute the hash, and after that, cache the value".

So, no use to compute the hash in constructor, if no one will call the hashCode method. But after the first one calls hashCode, then the value is computed, will never change, so you can cache it for further calls.

LATER EDIT:

Now I see your point. That is the purpose of the local variable. And not using a "thread safe" mechanism is a "performance decision", as creating a lock, using it, releasing it comes with a cost, while the case you are talking about (2 threads calling hashCode at the same time) is a pretty difficult to reach in real life scenario, and the result will be the same hash value.

Reading and writing to hashCode is not properly synchronized according to the Java Memory Model but it is safe nevertheless.

If multiple threads write to hashCode then, due to the immutability of a String object, it is implicit that the calculation yields the same result. Assume that this result is x then any thread is guaranteed to observe either 0 or x because int is atomic on all VMs. In case that a thread observes 0 , it simply recalculates the hash code which is guaranteed to yield x , thus only resetting the value if another thread applied the operation concurrently or within its thread-local cache.

In this sense, the outcome is deterministic. At the same time, it is not required to synchronized threads for sharing this instance. Assume that you would have some key "foo" throughout your application used by all of your threads. Due to Java's string deduplication, this string constant would be shared among all of your threads which would have to synchronize only to save them the trouble to recompute the hash codes. Computing the hash code is however a very cheap operation whereas synchronization is very expensive. As the correctness is given, this optimization makes sense.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM