[英]Process a lot of data without waiting for a chunk to finish
I am confused with map
, imap
, apply_async
, apply
, Process
etc from the multiprocessing
python package.我对来自
multiprocessing
python package 的map
、 imap
、 apply_async
、 apply
、 Process
等感到困惑。
What I would like to do:我想做的事:
I have 100 simulation script files that need to be run through a simulation program.我有100 个需要通过模拟程序运行的模拟脚本文件。 I would like python to run as many as it can in parallel, then as soon as one is finished, grab a new script and run that one .
我希望 python 尽可能多地并行运行,然后在完成后立即获取一个新脚本并运行该脚本。 I don't want any waiting.
我不想等待。
Here is a demo code:这是一个演示代码:
import multiprocessing as mp
import time
def run_sim(x):
# run
print("Running Sim: ", x)
# artificailly wait 5s
time.sleep(5)
return x
def main():
# x => my simulation files
x = list(range(100))
# run parralel process
pool = mp.Pool(mp.cpu_count()-1)
# get results
result = pool.map(run_sim, x)
print("Results: ", result)
if __name__ == "__main__":
main()
However, I don't think that map is the correct way here since I want the PC not to wait for the batch to be finished but immediately proceed to the next simulation file.但是,我不认为 map 是正确的方法,因为我希望 PC 不要等待批处理完成,而是立即进入下一个模拟文件。
The code will run mp.cpu_count()-1
simulations at the same time and then wait for every one of them to be finished, before starting a new batch of size mp.cpu_count()-1
.代码将同时运行
mp.cpu_count()-1
模拟,然后等待每个模拟完成,然后再开始新的一批大小mp.cpu_count()-1
。 I don't want the code to wait, but just to grab a new simulation file as soon as possible.我不希望代码等待,而只是尽快获取一个新的模拟文件。
Do you have any advice on how to code it better?你对如何更好地编码有什么建议吗?
Some clarifications:一些澄清:
I am reducing the pool to one less than the CPU count because I don't want to block the PC.我将池减少到比 CPU 计数少一,因为我不想阻塞 PC。 I still need to do light work while the code is running.
在代码运行时,我仍然需要做一些轻松的工作。
It works correctly using map.它使用 map 可以正常工作。 The trouble is simply that you sleep all thread for 5 seconds, so they all finish at the same time.
问题只是你让所有线程休眠 5 秒,所以它们都同时完成。
Try this code to see the effect correctly:试试这段代码,看看效果是否正确:
import multiprocessing as mp
import time
import random
def run_sim(x):
# run
t = random.randint(3,10)
print("Running Sim: ", x, " - sleep ", t)
time.sleep(t)
return x
def main():
# x => my simulation files
x = list(range(100))
# run parralel process
pool = mp.Pool(mp.cpu_count()-1)
# get results
result = pool.map(run_sim, x)
print("Results: ", result)
if __name__ == "__main__":
main()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.