简体   繁体   English

仅在添加到HashSet时同步是否是线程安全的?

[英]Is it thread-safe to synchronize only on add to HashSet?

Imagine having a main thread which creates a HashSet and starts a lot of worker threads passing HashSet to them. 想象一下,有一个主线程创建一个HashSet并启动许多工作线程将HashSet传递给它们。

Just like in code below: 就像下面的代码一样:

void main() {
  final Set<String> set = new HashSet<>();
  final ExecutorService threadExecutor = 
  Executors.newFixedThreadPool(10);

  threadExecutor.submit(() -> doJob(set));
} 

void doJob(final Set<String> pSet) {
  // do some stuff
  final String x = ... // doesn't matter how we received the value.
  if (!pSet.contains(x)) {
    synchronized (pSet) {
      // double check to prevent multiple adds within different threads
      if (!pSet.contains(x)) {
        // do some exclusive work with x.
        pSet.add(x);
      }
    }
  }
  // do some stuff
}

I'm wondering is it thread-safe to synchronize only on add method? 我想知道只在add方法上同步才是线程安全的吗? Is there any possible issues if contains is not synchronized? 如果contains不同步,是否存在任何问题?

My intuition telling me this is fine, after leaving synchronized block changes made to set should be visible to all threads, but JMM could be counter-intuitive sometimes. 我的直觉告诉我这很好,在保持对set设置的同步块更改后应该对所有线程都可见,但JMM有时可能是反直觉的。

PS I don't think it's a duplicate of How to lock multiple resources in java multithreading Even though answers to both could be similar, this question addresses more particular case. PS我不认为它是如何在java多线程中锁定多个资源的重复尽管两者的答案可能相似,但这个问题解决了更具体的情况。

I'm wondering is it thread-safe to synchronize only on the add method? 我想知道只在add方法上进行同步才是线程安全的吗? Are there any possible issues if contains is not synchronized as well? 如果contains不同步,是否有任何可能的问题?

Short answers: No and Yes. 简答:不,是的。

There are two ways of explaining this: 有两种解释方法:

The intuitive explanation 直观的解释

Java synchronization (in its various forms) guards against a number of things, including: Java同步(以各种形式)防范许多事情,包括:

  • Two threads updating shared state at the same time. 两个线程同时更新共享状态。
  • One thread trying to read state while another is updating it. 一个线程试图读取状态而另一个线程正在更新它。
  • Threads seeing stale values because memory caches have not been written to main memory. 线程看到陈旧的值,因为内存缓存尚未写入主内存。

In your example, synchronizing on add is sufficient to ensure that two threads cannot update the HashSet simultaneously, and that both calls will be operating on the most recent HashSet state. 在您的示例中,在add上进行同步足以确保两个线程无法同时更新HashSet ,并且两个调用都将在最新的HashSet状态下运行。

However, if contains is not synchronized as well, a contains call could happen simultaneously with an add call. 但是,如果contains不同步,则contains调用可能与add调用同时发生。 This could lead to the contains call seeing an intermediate state of the HashSet , leading to an incorrect result, or worse. 这可能导致contains调用看到HashSet的中间状态,导致错误的结果,或者更糟。 This can also happen if the calls are not simultaneous, due to changes not being flushed to main memory immediately and/or the reading thread not reading from main memory. 如果调用不是同时发生的,也会发生这种情况,因为更改没有立即刷新到主存储器和/或读取线程没有从主存储器读取。

The Memory Model explanation 记忆模型的解释

The JLS specifies the Java Memory Model which sets out the conditions that must be fulfilled by a multi-threaded application to guarantee that one thread sees the memory updates made by another. JLS指定Java内存模型,它规定了多线程应用程序必须满足的条件,以保证一个线程看到另一个线程所做的内存更新。 The model is expressed in mathematical language, and not easy to understand, but the gist is that visibility is guaranteed if and only if there is a chain of happens before relationships from the write to a subsequent read. 该模型以数学语言表达,并不容易理解,但要点是,当且仅当从写入到后续读取的关系之前存在链条时,才能保证可见性。 If the write and read are in different threads, then synchronization between the threads is the primary source of these relationships. 如果写入和读取位于不同的线程中,则线程之间的同步是这些关系的主要来源。 For example in 例如在

 // thread one
 synchronized (sharedLock) {
    sharedVariable = 42;
 }

 // thread two
 synchronized (sharedLock) {
     other = sharedVariable;
 }

Assuming that the thread one code is run before the thread two code, there is a happens before relationships between thread one releasing the lock and thread two acquiring it. 假设线程一个代码在线程两个代码之前运行,则释放锁的线程1和线程二获取它之间的关系之前发生 With this and the "program order" relations, we can build a chain from the write of 42 to the assignment to other . 有了这个和“程序顺序”的关系,我们可以建立一个从42写到指向other的链。 This is sufficient to guarantee that other will be assigned 42 (or possibly a later value of the variable) and NOT any value in sharedVariable before 42 was written to it. 这足以保证在写入42之前,将为other人分配42 (或可能是变量的后续值)而不是sharedVariable任何值。

Without the synchronized block synchronizing on the same lock, the second thread could see a stale value of sharedVariable ; 如果没有synchronized块在同一个锁上同步,则第二个线程可以看到sharedVariable的陈旧值; ie some value written to it before 42 was assigned to it. 即在42分配给它之前写入的一些值。

That code is thread safe for the the synchronized (pSet) { } part : 该代码对于synchronized (pSet) { }部分是线程安全的:

if (!pSet.contains(x)) {
  synchronized (pSet) { 
  // Here you are sure to have the updated value of pSet    
  if (!pSet.contains(x)) {
    // do some exclusive work with x.
    pSet.add(x);
  }
}

because inside the synchronized statement on the pSet object : 因为在pSet对象的synchronized语句中:

  • one and only one thread may be in this block. 此块中只有一个且只有一个线程。
  • and inside it, pSet has also its updated state guaranteed by the happens-before relationship with the synchronized keyword. 在其中, pSet还具有由synchronized关键字发生在之前的关系保证的更新状态。

So whatever the value returned by the first if (!pSet.contains(x)) statement for a waiting thread, when this waited thread will wake up and enter in the synchronized statement, it will set the last updated value of pSet . 因此,无论等待线程的第一个if (!pSet.contains(x))语句返回的值如何,当此等待线程将被唤醒并进入synchronized语句时,它将设置pSet的最后更新值。 So even if the same element was added by a previous thread, the second if (!pSet.contains(x)) would return false . 因此,即使前一个线程添加了相同的元素,第二个if (!pSet.contains(x))也会返回false

But this code is not thread safe for the first statement if (!pSet.contains(x)) that could be executed during a writing on the Set . 但是if (!pSet.contains(x))可以在Set上写入期间执行, if (!pSet.contains(x))此代码对于第一个语句不是线程安全的。
As a rule of thumb, a collection not designed to be thread safe should not be used to perform concurrently writing and reading operations because the internal state of the collection could be in a in-progress/inconsistent state for a reading operation that would occur meanwhile a writing operation. 根据经验,不应将用于线程安全的集合用于执行并发写入和读取操作,因为集合的内部状态可能处于正在进行/不一致的状态,以便同时发生读取操作写作操作。
While some no thread safe collection implementations accept such a usage in the facts, that is not guarantee at all that it will always be true. 虽然有些没有线程安全的集合实现在事实中接受这样的用法,但这并不能保证它始终是真的。
So you should use a thread safe Set implementation to guarantee the whole thing thread safe . 所以你应该使用线程安全的Set实现来保证整个事情的线程安全
For example with : 例如:

Set<String> pSet = ConcurrentHashMap.newKeySet();

That uses under the hood a ConcurrentHashMap , so no lock for reading and a minimal lock for writing (only on the entry to modify and not the whole structure). 它在引擎盖下使用了ConcurrentHashMap ,因此没有用于读取的锁和用于写入的最小锁(仅限于修改的条目而不是整个结构)。

No ,

You don't know in what state the Hashset might be during add by another Thread. 您不知道在另一个线程添加时 Hashset处于什么状态。 There might be fundamental changes ongoing, like splitting of buckets, so that contains may return false during the adding by another thread, even if the element would be there in a singlethreaded HashSet. 可能存在基本的变化,例如分割存储桶,因此包含可能在另一个线程添加期间返回false ,即使该元素存在于单线程HashSet中。 In that case you would try to add an element a second time. 在这种情况下,您将尝试第二次添加元素。

Even Worse Scenario: contains might get into an endless loop or throw an exception because of an temporary invalid state of the HashSet in the memory used by the two threads at the same time. 甚至更糟糕的场景: 包含可能会进入无限循环或抛出异常,因为两个线程同时使用的内存中的HashSet暂时无效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM