简体   繁体   中英

Thread Pool, Shared Data, Java Synchronization

Say, I have a data object:

class ValueRef { double value; }

Where each data object is stored in a master collection:

Collection<ValueRef> masterList = ...;

I also have a collection of jobs, where each job has a local collection of data objects (where each data object also appears in the masterList ):

class Job implements Runnable { 
     Collection<ValueRef> neededValues = ...; 
     void run() {
         double sum = 0;
         for (ValueRef x: neededValues) sum += x;
         System.out.println(sum);
     } 
}

Use-case:

  1. for (ValueRef x: masterList) { x.value = Math.random(); }

  2. Populate a job queue with some jobs.

  3. Wake up a thread pool

  4. Wait until each job has been evaluated

Note: During the job evaluation, all of the values are all constant. The threads however, have possibly evaluated jobs in the past, and retain cached values.

Question: what is the minimal amount of synchronization necessary to ensure each thread sees the latest values?

I understand synchronize from the monitor/lock-perspective, I do not understand synchronize from the cache/flush-perspective (ie. what is being guaranteed by the memory model on enter/exit of the synchronized block).

To me, it feels like I should need to synchronize once in the thread that updates the values to commit the new values to main memory, and once per worker thread, to flush the cache so the new values are read. But I'm unsure how best to do this.

My approach: create a global monitor: static Object guard = new Object(); Then, synchronize on guard , while updating the master list. Then finally, before starting the thread pool, once for each thread in the pool, synchronize on guard in an empty block.

Does that really cause a full flush of any value read by that thread? Or just values touched inside the synchronize block? In which case, instead of an empty block, maybe I should read each value once in a loop?

Thanks for your time.


Edit: I think my question boils down to, once I exit a synchronized block, does every first read (after that point) go to main memory? Regardless of what I synchronized upon?

It doesn't matter that threads of a thread pool have evaluated some jobs in the past.

Javadoc of Executor says:

Memory consistency effects: Actions in a thread prior to submitting a Runnable object to an Executor happen-before its execution begins, perhaps in another thread.

So, as long as you use standard thread pool implementation and change the data before submitting the jobs you shouldn't worry about memory visibility effects.

What you are planning sounds sufficient. It depends on how you plan to "wake up thread pool."

The Java Memory Model provides that all writes performed by a thread before entering a synchronized block are visible to threads that subsequently synchronize on that lock.

So, if you are sure the worker threads are blocked in a wait() call (which must be inside a synchronized block) during the time you update the master list, when they wake up and become runnable, the modifications made by the master thread will be visible to these threads.

I would encourage you, however, to apply the higher level concurrency utilities in the java.util.concurrent package. These will be more robust than your own solution, and are a good place to learn concurrency before delving deeper.


Just to clarify: It's almost impossible to control worker threads without using a synchronized block where a check is made to see whether the worker has a task to implement. Thus, any changes made by the controller thread to the job happen-before the worker thread awakes. You require a synchronized block, or at least a volatile variable to act as a memory barrier; however, I can't think how you'd create a thread pool with using one of these.

As an example of the advantages of using the java.util.concurrency package, consider this: you could use a synchronized block with a wait() call in it, or a busy-wait loop with a volatile variable. Because of the overhead of context switching between threads, a busy wait can actually perform better under certain conditions—it's not necessary the horrible idea that one might assume at first glance.

If you use the Concurrency utilities (in this case, probably an ExecutorService ), the best selection for your particular case can be made for you, factoring in the environment, the nature of the task, and the needs of other threads at a given time. Achieving that level of optimization yourself is a lot of needless work.

Why don't you make Collection<ValueRef> and ValueRef immutable or at least don't modify the values in the collection after you have published the reference to the collection. Then you will not have any worry about synchronization.

That is when you want to change the values of the collection, create a new collection and put new values in it. Once the values have been set pass the collection reference new job objects.

The only reason not to do this would be if the size of the collection is so large that it barely fits in memory and you cannot afford to have two copies, or the swapping of the collections would cause too much work for the garbage collector (prove that one of these is a problem before you use a mutable data structure for threaded code).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM