简体   繁体   中英

Store the results of a multiprocessing queue in python

I'm trying to store the results of multiple API requests using multiprocessing queue as the API can't handle more than 5 connections at once.

I found part of a solution of How to use multiprocessing with requests module?

def worker(input_queue, stop_event):
    while not stop_event.is_set():
        try:
            # Check if any request has arrived in the input queue. If not,
            # loop back and try again.
            request = input_queue.get(True, 1)
            input_queue.task_done()
        except queue.Empty:
            continue
        print('Started working on:', request)
        api_request_function(request) #make request using a function I wrote

        print('Stopped working on:', request)


def master(api_requests):
    input_queue = multiprocessing.JoinableQueue()
    stop_event = multiprocessing.Event()
    workers = []
    # Create workers.
    for i in range(3):
        p = multiprocessing.Process(target=worker,
                                    args=(input_queue, stop_event))
        workers.append(p)
        p.start()

    # Distribute work.
    for requests in api_requests:
        input_queue.put(requests)

    # Wait for the queue to be consumed.
    input_queue.join()
    # Ask the workers to quit.
    stop_event.set()

    # Wait for workers to quit.
    for w in workers:
        w.join()

    print('Done')

I've looked at the documentation of threading and pooling but missing a step. So the above runs and all requests get a 200 status code which is great. But I do I store the results of the requests to use?

Thanks for your help Shan

I believe you have to make a Queue. The code can be a little tricky, you need to read up on the multiprocessing module. In general, with multiprocessing, all the variables are copied for each worker, hence you can't do something like appending to a global variable. Since that will literally be copied and the original will be untouched. There are a few functions that already automatically incorporate workers, queues, and return values. Personally, I try to write my functions to work with mp.map, like below:

def worker(*args,**kargs):
    #do stuff
    return 'thing'
output = multiprocessing.Pool().map(worker,[1,2,3,4,5])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM