简体   繁体   中英

Threading/TPL etc

I have a requirement to process 16million database records and this is gonna take me forever. I am not threading savvy so thought I'd ask here. My thinking is that I need to perform the following but am unsure how:

  1. Get my 16m records
  2. split these out into a number of "chunks"
  3. send each of these chunks for processing on their on thread

Does this sound right and how would I split my workload(16m records) up etc...?

Cheers if you can offer sound advice.

I suggest you use the well-known producer-consumer pattern in the following way:

  • a single thread (the producer) pulls records from the db, creates tasks (with a single or maybe multiple records to process) and puts them in a shared queue.
  • a group of threads (the consumers) pull tasks from the queue and process them in parallel.

A very simple way to implement this is using the ThreadPool class. It conveniently manages the queue and the workers for you. All you need is to implement the producer and queue tasks via QueueUserWorkItem .

Alternatively, if you want to use TPL constructs, you can implement the above mechanisms yourself using a combination of Task s and perhaps a ConcurrentQueue .

If you want to process a collection of items in parallel, that's exactly what Parallel.Foreach() is for. You will just pass it an action that you want to perform for each item (possibly as a lambda) and it will take care of splitting your collection into chunks and executing it.

But you have to be careful about what you put into that action. That's because the code will be executed on more threads concurrently, so you shouldn't access any shared state in an way that is not thread-safe.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM