简体   繁体   中英

return values from exceptions in multiprocessing

I'm calling a function task(url, param1, param2) that return either the result of an API call to url = url or the url name if the API call did not work. My task looks something like:

def task(url, param1, param2):
    try:
        make_api_call(url, param1, param2)
    except ValueError as e:
        print("val error")            
        return url

Now I want to apply task to a list of 100 urls and start multiprocessing them as:

import multiprocessing as mp

def run_tasks(urls, param1, param2):
    jobs = []
    for i in range(len(urls)):
        process = mp.Process(target=task, args=(urls[i], param1, param2))
        jobs.append(process)

    ## catch error processes
    error_urls = []

    ## start processes
    for j in jobs:
        j.start()

    ## finish processes
    for j in jobs:
        j.join()

From the above run_tasks , how would I return a list of the url s that had given me a ValueError ? I tried error_urls.append(j.join()) , but this didn't work.

There are two method to get result from the process.

Method 1. Use list from Manager . You need't to use lock to synchronize between process.

from multiprocessing import Process, Manager

def task(url, param1, param2, error_list):
    try:
        make_api_call(url, param1, param2)
    except ValueError as e:
        print("val error")            
        error_list.append(url)

def run_tasks(urls, param1, param2):

    error_list = Manager().list()    
    jobs = []

    for i in range(len(urls)):
        process = Process(target=task, args=(urls[i], param1, param2, error_list))
        jobs.append(process)

    ## start processes
    for j in jobs:
        j.start()

    ## finish processes
    for j in jobs:
        j.join()

Method 2. Use ProcessPoolExecutor from concurrent.futures . This method is easy to understand and less code.

from concurrent import futures

def task(url, param1, param2):
    try:
        make_api_call(url, param1, param2)
    except ValueError as e:
        print("val error")            
        return url

def runt_tasks(urls, param1, param2):

    with futures.ProcessPoolExecutor() as executor:
        result = executor.map(task, urls, [param1] * len(urls), [param2] * len(urls))

    error_list = [item for item in result if item is not None]

At last, from the description of the question. It's a IO sensitive problem. I recommend you to use ThreadPoolExecutor . When you do a IO operation, the thread will release the GIL to let other threads to run. For a CPU sensitive problem, you'd better to use ProcessPoolExecutor . And asyncio is another choice to do concurrent programming in Python 3.

Try the shared memory. use this multiprocessing.sharedctypes.Array(typecode_or_type, size_or_initializer, *args[, lock])

You can define this in in run_tasks

from multiprocessing import Process, Lock
from multiprocessing.sharedctypes import Array
lock = Lock()
error_urls = Array(c_char_p, [], lock = lock)

And

def task(url, param1, param2):
    try:
       make_api_call(url, param1, param2)
    except ValueError as e:
       print("val error")            
       error_urls.append(url)

as the doc of Array():

The same as RawArray() except that depending on the value of lock a process-safe synchronization wrapper may be returned instead of a raw ctypes array.

So it is process-safe. More about Array() can refer this , about ctypes(c_char_p)refer this

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM