简体   繁体   中英

What is the most appropriate way to manage threads executing the same task?

I have a lot of data in a database(PostgreSQL) and need to process all. My program have threads to process all these data and follows these logic.

  1. Get a part of data from database
  2. Process
  3. Save data

I have doubt about how is best way to do this. I have three ideas:

  • Create a manager class that runs in a loop getting data from database and holding a queue of objects to process. Create a process class that runs in a loop getting the object to process from the manager class.

  • To de same above, but without the manager class, so the process class will have the queue of objects shared between it and they will be responsible for getting the data from database too.

  • A manager class that runs in a loop getting data from database, but it create the process classes with the data to process, so the process class won't request nothing from the manager. It's created, processed and destroyed, and not run in a loop.

I don't know what is better, and if there is another solution more efficient.

You are describing so called manager-worker model. I think that your first description is better.

It pushes data into a queue and multiple workers process it. You can use thread pool for workers. The workers are waiting on queue. Once work is pushed to queue one of the workers takes it immediately. When they are done they can push the result into outgoing queue and yet another thread will send the data to DB. Alternatively each worker can save results himself. It is up to you and depends on your task.

User Excecutors and BlockingQueue for implementation. All you need is in java.util.concurrent package and you can find a lot of tutorials and example in web how to use them.

Good luck.

While your first suggestion is good, I'd try to simplify it a bit

Create a manager class that runs in a loop getting data from database and holding a queue of objects to process. Create a process class that runs in a loop getting the object to process from the manager class.

I'd create a manager class that gains a list of current data to process. It then creates instances of executors which simply run through a single dataset they're provided when they're created. They then exit.

The manager is responsible for producing the looping, or iterating the data sets it's aware of at a given time. I'd further abstract that and have a scheduled task creating a manager periodically to process new data sets.

The reason for this is that it simplifies concurrent programming. The data set processor is only aware of a single set of data, and you can program it as if it is ignorant of concurrency. It gets a job, processes it, and it's done.

Likewise for the manager, it gets a set of data, processes it by creating processors, and it's done.

The last part of the puzzle would be to ensure that no two managers, of you allow multiple instances, are assigned the same sets of data. Probably easiest to understand if you only create a single thread pool to run managers in. If the scheduled time comes up and there's still a manager running, then you don't create a new one.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM