I'm calling a function task(url, param1, param2)
that return either the result of an API call to url = url
or the url
name if the API call did not work. My task
looks something like:
def task(url, param1, param2):
try:
make_api_call(url, param1, param2)
except ValueError as e:
print("val error")
return url
Now I want to apply task
to a list of 100 urls and start multiprocessing
them as:
import multiprocessing as mp
def run_tasks(urls, param1, param2):
jobs = []
for i in range(len(urls)):
process = mp.Process(target=task, args=(urls[i], param1, param2))
jobs.append(process)
## catch error processes
error_urls = []
## start processes
for j in jobs:
j.start()
## finish processes
for j in jobs:
j.join()
From the above run_tasks
, how would I return a list of the url
s that had given me a ValueError
? I tried error_urls.append(j.join())
, but this didn't work.
There are two method to get result from the process.
Method 1. Use list
from Manager
. You need't to use lock to synchronize between process.
from multiprocessing import Process, Manager
def task(url, param1, param2, error_list):
try:
make_api_call(url, param1, param2)
except ValueError as e:
print("val error")
error_list.append(url)
def run_tasks(urls, param1, param2):
error_list = Manager().list()
jobs = []
for i in range(len(urls)):
process = Process(target=task, args=(urls[i], param1, param2, error_list))
jobs.append(process)
## start processes
for j in jobs:
j.start()
## finish processes
for j in jobs:
j.join()
Method 2. Use ProcessPoolExecutor
from concurrent.futures
. This method is easy to understand and less code.
from concurrent import futures
def task(url, param1, param2):
try:
make_api_call(url, param1, param2)
except ValueError as e:
print("val error")
return url
def runt_tasks(urls, param1, param2):
with futures.ProcessPoolExecutor() as executor:
result = executor.map(task, urls, [param1] * len(urls), [param2] * len(urls))
error_list = [item for item in result if item is not None]
At last, from the description of the question. It's a IO sensitive problem. I recommend you to use ThreadPoolExecutor
. When you do a IO operation, the thread will release the GIL to let other threads to run. For a CPU sensitive problem, you'd better to use ProcessPoolExecutor
. And asyncio
is another choice to do concurrent programming in Python 3.
Try the shared memory. use this multiprocessing.sharedctypes.Array(typecode_or_type, size_or_initializer, *args[, lock])
You can define this in in run_tasks
from multiprocessing import Process, Lock
from multiprocessing.sharedctypes import Array
lock = Lock()
error_urls = Array(c_char_p, [], lock = lock)
And
def task(url, param1, param2):
try:
make_api_call(url, param1, param2)
except ValueError as e:
print("val error")
error_urls.append(url)
as the doc of Array():
The same as RawArray() except that depending on the value of lock a process-safe synchronization wrapper may be returned instead of a raw ctypes array.
So it is process-safe. More about Array() can refer this , about ctypes(c_char_p)refer this
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.