简体   繁体   中英

Completing threads and interacting with results at a different rate

I'd like to get some feedback on an approach for receiving data from multiple threads in a concurrent.futures.ThreadPoolExecutor and iterating over the results. Given the scenario a ThreadPoolExecutor has future thread results appended to a buffer container and a secondary / decoupled operation read and withdraw from the same buffer container.

Thread Manager Workflow

                    /|-> Thread 1 > results \
ThreadPoolExecutor --|-> Thread 2 > results --> Queue [1,2,3] (end) 
                    \|-> Thread 3 > results /
             

Now we have results from the threads in a First-In-First-Out queue container - which needs to be thread-safe. Now the above process is done and results (str|int|bool|list|dict|any) are in the container awaiting processing by the next step: Communicate the gathered results.

Communication Workflow

                                           /|-> Terminal Print
Queue [1,2,3] < Listener > Communicate --|-> Speech Engine Say 
                                           \|-> Write to Log / File

The Communicate class needs to be "listening" on the Queue for new entries, and processing each as they come in at it's own speed (the rate of speech using a text to speech module -Producer-Consumer Problem ) and potentially any number of other outputs, so this really can't be invoked from the top-down. If, the Thread Manager calls directly or lets each thread call the Communicate class directly to invoke the Speech Engine we will hear stuttered speech as the speech engine will override itself with each invocation. Thus, we need to decouple the Thread Manager workflow from the Communicate workflow but have them write & read with an In/Out type buffer or Queue and need for a "listener" concept.

I've found references for a structure like the following running as a daemon thread, but the while loop makes me cringe and consumes too much cpu, so I still need a non-blocking approach, where self.pipeline is a queue.Queue object:

    while True :
        try :
            if not self.pipeline.empty ( ) :    
                task = self.pipeline.get ( timeout=1 )
                if task :       
                    self.serve ( task, )
        except queue.Empty :
            continue

Again, in need of something other than a while loop for this...

As you write in the comments, its standard producer consumer problem. One solution in python is using multithreading and the Queue class The queue is thread safe . Its using a mutex internally, which handles busy waiting.

Queue.get will eventually call wait on its internal mutex. This will block the calling thread . But instead of busy waiting , which is using cpu, the thread will be put in sleep state. A thread scheduler of the os will take over from here, and will wake up the thread , when items are available (simplified ).

So you can still have while True loops within multiple thread consumers which call queue.get on shared queue. If items are available the threads directly process them, if not, they go into sleep mode and free the cpu. Same goes for producer threads , they simply call Queue.put

However there is one caveat in python. Python has something called global interpreter lock - GIL. This is because it is using a lot of c extension and allows modules which bring in c extensions. But those are not always thread safe. A GIL means, that only one thread will run on only one cpu at a time.

So , once an item is in the queue, only one consumer at a time will wake up and process the result. Also normally one producer can run at a time. Except those threads start waiting for some I/O, like reading from a socket. Because I/O notification is handled by some other cpu part, there is always some waiting time for I/O. In that time, the threads release the GIL and other threads can do the work.

Summed up, it only makes sense to have multiple consumers and producer threads if they also do some I/O work - read/write on a network socket or disk. This is called concurrency. If you want to use multiple cpu cores at same time, you need to use multiprocessing in python instead of threads. And it only makes sense to have more processes than cores, if there is also some IO work.

Example

I would suggest that you use multiprocessing rather than threading to ensure maximum parallelism. I am not sure whether you really need a process pool for what you are trying to do rather than 4 dedicated processes; it's a question of how "threads" 1 through 3 are getting their data for feeding to the queue to be processed by the 4th process. Are these implemented by a single, identical worker function to whom "jobs" are submitted? If so then a process pool of 3 identical workers is what you want. But if these are 3 separate functions with their own processing logic, then you just want to create 3 Process instances. I am working on the second assumption.

Since we are now in the realm of multiprocessing, I would suggest using a "managed" Queue instance created with the following code:

with multiprocessing.Manager() as manager:
    q = manager.Queue()

Access to such a queue is synchronized across processeses. The following code is a rough idea of creating the processes and accessing the queue:

import multiprocessing
import time

class Communicate:
    def listen(self, q):
        while True:
            obj = q.get()
            if obj == None: # our signal to terminate
                return
            # do something with objects
            print(obj)

def process1(q):
    while True:
        time.sleep(1)
        q.put(1)

def process2(q):
    while True:
        time.sleep(.5)
        q.put(2)

def process3(q):
    while True:
        time.sleep(1.5)
        q.put(3)



if __name__ == '__main__':
    communicator = Communicate()
    with multiprocessing.Manager() as manager:
        #start the commmunicator process:
        q = manager.Queue()
        p = multiprocessing.Process(target=communicator.listen, args=(q,))
        p.start()
        # start the other 3 processes:
        p1 = multiprocessing.Process(target=process1, args=(q,))
        p1.daemon = True
        p1.start()
        # start the other 3 processes:
        p2 = multiprocessing.Process(target=process2, args=(q,))
        p2.daemon = True
        p2.start()
        # start the other 3 processes:
        p3 = multiprocessing.Process(target=process3, args=(q,))
        p3.daemon = True
        p3.start()
        input('Hit any enter to terminate\n')
        q.put(None) # signal for termination
        p.join() # wait for process to complete
     

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM