简体   繁体   中英

Is iteration via Collections.synchronizedSet(…).forEach() guaranteed to be thread safe?

As we know, iterating over a concurrent collection is not thread safe by default, so one cannot use:

Set<E> set = Collections.synchronizedSet(new HashSet<>());
//fill with data
for (E e : set) {
    process(e);
}

This happens as data may be added during iteration, because there is no exclusive lock on set .

This is describe in the javadoc of Collections.synchronizedSet :

public static Set synchronizedSet(Set s)

Returns a synchronized (thread-safe) set backed by the specified set. In order to guarantee serial access, it is critical that all access to the backing set is accomplished through the returned set.

It is imperative that the user manually synchronize on the returned set when iterating over it:

Set s = Collections.synchronizedSet(new HashSet());
...
synchronized (s) { Iterator i = s.iterator(); // Must be in the synchronized block while (i.hasNext()) foo(i.next()); }

Failure to follow this advice may result in non-deterministic behavior.

However , this does not apply to Set.forEach , which inherits the default method forEach from Iterable.forEach .

Now I looked into the source code, and here we can see that we have the following structure:

  1. We ask for a Collections.synchronizedSet() .
  2. We get one:

     public static <T> Set<T> synchronizedSet(Set<T> s) { return new SynchronizedSet<>(s); } ... static class SynchronizedSet<E> extends SynchronizedCollection<E> implements Set<E> { private static final long serialVersionUID = 487447009682186044L; SynchronizedSet(Set<E> s) { super(s); } SynchronizedSet(Set<E> s, Object mutex) { super(s, mutex); } public boolean equals(Object o) { if (this == o) return true; synchronized (mutex) {return c.equals(o);} } public int hashCode() { synchronized (mutex) {return c.hashCode();} } }
  3. It extends SynchronizedCollection , which has the following interesting methods next to the obvious ones:

     // Override default methods in Collection @Override public void forEach(Consumer<? super E> consumer) { synchronized (mutex) {c.forEach(consumer);} } @Override public boolean removeIf(Predicate<? super E> filter) { synchronized (mutex) {return c.removeIf(filter);} } @Override public Spliterator<E> spliterator() { return c.spliterator(); // Must be manually synched by user! } @Override public Stream<E> stream() { return c.stream(); // Must be manually synched by user! } @Override public Stream<E> parallelStream() { return c.parallelStream(); // Must be manually synched by user! }

The mutex used here is the same object as to which all operations of Collections.synchronizedSet lock to.

Now we can, judging by the implementation say that it is thread safe to use Collections.synchronizedSet(...).forEach(...) , but is it also thread safe by specification ?

(Confusingly enough, Collections.synchronizedSet(...).stream().forEach(...) is not thread safe by implementation, and the verdict of the specification seems to be unknown aswell.)

As you wrote, judging by implementation, forEach() is thread-safe for the collections provided with JDK (see disclaimer below) as it requires monitor of mutex to be acquired to proceed.

Is it also thread safe by specification?

My opinion - no, and here is an explanation. Collections.synchronizedXXX() javadoc, rewritten in short words, says - "all methods are thread-safe except for those used for iterating over it".

My other, although very subjective argument is what yshavit wrote - unless told/read that, consider API/class/whatever not thread-safe.

Now, let's take a closer look at the javadocs. I guess I may state that method forEach() is used to iterate over it, so, following the advice from javadoc, we should consider it not thread-safe, although it is opposite to reality (implementation).

Anyway, I agree with yshavit 's statement that the documentation should be updated as this is most likely a documentation, not implementation flaw. But, no one can say for sure except for JDK developers, see concerns below.

The last point I'd like to mention within this discussion - we can assume that custom collection can be wrapped with Collections.synchronizedXXX() , and the implementation of forEach() of this collection can be... can be anything. The collection might perform asynchronous processing of elements within the forEach() method, spawn a thread for each element... it is bounded only by author's imagination, and synchronized(mutex) wrap cannot guarantee thread-safety for such cases . That particular issue might be the reason not to declare forEach() method as thread-safe..

It's worth to have a look at the documentation of Collections.synchronizedCollection rather than Collections.synchronizedSet() as that documentation has been cleaned up already:

It is imperative that the user manually synchronize on the returned collection when traversing it via Iterator , Spliterator or Stream : …

I think, this makes it pretty clear that there is a distinction between the iteration via an object other than the synchronized Collection itself and using its forEach method. But even with the old wording you can draw the conclusion that there is such a distinction:

It is imperative that the user manually synchronize on the returned set when iterating over it :…

(emphasis by me)

Compare to the documentation for Iterable.forEach :

Performs the given action for each element of the Iterable until all elements have been processed or the action throws an exception.

While it is clear to the developer that there must be an (internal) iteration happening to achieve this, this iteration is an implementation detail. From the given specification's wording it's just a (meta-)action for performing an action to each element.

When using that method, the user is not iterating over the elements and hence not responsible for the synchronization mentioned in the Collections.synchronized… documentation.

However, that's a bit subtle and it's good that the documentation of synchronizedCollection lists the cases for manual synchronization explicitly and I think the documentation of the other methods should be adapted as well.

As @Holger said, the doc clearly says user must manually synchronize collections returned by Collections.synchronizedXyz() when tranversing it via Iterator :

It is imperative that the user manually synchronize on the returned collection when traversing it via Iterator, Spliterator or Stream:

 Collection c = Collections.synchronizedCollection(myCollection); ... synchronized (c) { Iterator i = c.iterator(); // Must be in the synchronized block while (i.hasNext()) foo(i.next()); }

I want to explain a bit more about code.

Consider Collections.synchronizedList() method . It returns Collections.SynchronizedList class instance, which extends SynchronizedCollection which defines iterator() as follows:

public Iterator<E> iterator() {
    return c.iterator(); // Must be manually synched by user!
}

Compare this with other methods of SynchronizedCollections , for example:

public String toString() {
    synchronized (mutex) {return c.toString();}
}

Thus SynchronizedList inherits iterator() from SynchronizedCollection , which needs to be synched manually by user.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM