[英]Multiprocess Python 3
I have a script that loops over an array of numbers, those numbers are passed to a function which calls and API. 我有一个遍历数字数组的脚本,这些数字被传递给调用和API的函数。 It returns JSON data which is then written to a CSV.
它返回JSON数据,然后将其写入CSV。
for label_number in label_array:
call_api(domain, api_call_1, api_call_2, label_number, api_key)
The list can be up to 7000 elements big, as the API takes a few seconds to respond this can take hours to run the entire script. 该列表最多可以包含7000个元素,因为API需要花费几秒钟来响应,因此运行整个脚本可能要花费数小时。 Multiprocessing seems the way to go with this.
多重处理似乎是解决问题的方法。 I can't quite working out how to do this with the above loop.
我不能完全弄清楚如何用上述循环来做到这一点。 The documentation I am looking at is
我正在查看的文档是
https://docs.python.org/3.5/library/multiprocessing.html https://docs.python.org/3.5/library/multiprocessing.html
I found a similar article at 我在找到类似的文章
Python Multiprocessing a for loop Python多处理for循环
But manipulating it doesn't seem to work, I think I am buggering it up when it comes to passing all the variables into the function. 但是操纵它似乎没有用,我认为在将所有变量传递到函数中时,我一直在纠结。
Any help would be appreciated. 任何帮助,将不胜感激。
Multiprocessing could help but this sounds more like a threading problem. 多处理可能会有所帮助,但这听起来更像是线程问题。 Any IO implementation should be made asynchronous, which is what threading does.
任何IO实现都应该使异步,这就是线程的作用。 Better, in
python3.4
onwards, you could do asyncio
. 更好的是,从
python3.4
开始,您可以执行asyncio
。 https://docs.python.org/3.4/library/asyncio.html https://docs.python.org/3.4/library/asyncio.html
If you have python3.5
, this will be useful: https://docs.python.org/3.5/library/asyncio-task.html#example-hello-world-coroutine 如果您使用
python3.5
,这将非常有用: https : python3.5
You can mix asyncio
with multiprocessing
to get the optimized result. 您可以将
asyncio
与multiprocessing
混合使用以获得最佳结果。 I use in addition joblib
. 我另外使用
joblib
。
import multiprocessing
from joblib import Parallel, delayed
def parallelProcess(i):
for index, label_number in enumerate(label_array):
if index % i == 0:
call_api_async(domain, api_call_1, api_call_2, label_number, api_key)
if __name__=="__main__":
num_cores_to_use = multiprocessing.cpu_count()
inputs = range(num_cores_to_use)
Parallel(n_jobs=num_cores_to_use)(delayed(parallelProcess)(i) for i in inputs)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.