简体   繁体   English

Python 多处理未完成所有任务

[英]Python multiprocessing doesn't finish all tasks

I have a lot of files that need to be processed by some software.我有很多文件需要一些软件处理。 They don't need to be processed in the order.它们不需要按顺序处理。
Let's say I have 12 files and divided them in three lists then tried to send these lists to different processes to be executed:假设我有 12 个文件并将它们分成三个列表,然后尝试将这些列表发送到要执行的不同进程:

# import all files
files = glob.glob(src_path + "*.fits")
files_list = [files[0::3], files[1::3], files[2::3]]

num_processors = 3  # Create a pool of processors
p = Pool(processes = num_processors)  # get them to work in parallel
output = pool.map(run2, [f for f in files_list])


def run2(files, *args):
    for ffit in files:
        terminal_astrometry(command)

def terminal_astrometry(command):
    result = subprocess.run(command, stdout=subprocess.PIPE)

The problem is that sometimes, the program doesn't process all of these files, ie 11 files do get processed but one does not.问题是有时,程序不会处理所有这些文件,即 11 个文件确实得到处理,但一个没有。 Or other time, 9 finished but 3 were skipped.或者其他时间,9 个完成但 3 个被跳过。 Sometimes it does finish all tasks(process all of the files).有时它会完成所有任务(处理所有文件)。

Essentially, in run2() function I am calling that particular software that I want to be run in parallel (Astrometry.net) on every file run2() function received.本质上,在 run2() function 中,我正在调用我希望在收到的每个文件 run2() function 上并行运行的特定软件(Astrometry.net)。

EDIT2: I trimmed run2() function because it contains a lot of calculation(statistics) not relevant to a problem here(at least I think so) and posted it here. EDIT2:我修剪了 run2() function 因为它包含很多与这里的问题无关的计算(统计数据)(至少我认为是这样)并将其发布在这里。

Your symptoms sound like a race condition, however pool.map blocks the main process until all tasks have finished so the code will not progress past that line until all tasks have finished.您的症状听起来像是一种竞争条件,但是pool.map阻塞主进程,直到所有任务都完成,所以代码在所有任务完成之前不会超过该行。 Therefore, I think the problem may be within the run2 function - could you post its code?因此,我认为问题可能出在run2 function - 你能发布它的代码吗?

Edit: I previously had the following text in the answer too, the question has now been edited:编辑:我以前在答案中也有以下文字,现在已经编辑了问题:

You are calling run2 twice for each file - once asynchronously with the pool, and once in the main process.您为每个文件调用run2两次 - 一次与池异步,一次在主进程中。 Depending on the logic within this function, this could be the cause of the odd behaviour you're seeing.根据此 function 中的逻辑,这可能是您看到的奇怪行为的原因。

Software that I'm calling inside the run2() function is causing problems.我在 run2() function 中调用的软件导致了问题。 It tries to write stdout in the same file which causes it to not complete all the tasks.它试图在同一个文件中写入标准输出,这导致它无法完成所有任务。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python 多处理等待 k 个任务在执行第二个任务之前完成 - Python multiprocessing wating for k tasks to finish before executing a second task Python多处理并未使用RHEL6上的所有内核 - Python multiprocessing doesn't use all cores on RHEL6 python 多处理未启动 - python multiprocessing doesn't start 导入和多处理,代码不等待进程完成 - Imports and Multiprocessing, code doesn't wait for process to finish Python多处理。 同时完成所有流程 - Python multiprocessing. Finish all processes at the same time 了解python多重处理:代码是否等待所有过程完成 - Understanding python multiprocessing: does code wait for all process to finish Python MultiProcessing apply_async等待所有进程完成 - Python MultiProcessing apply_async wait for all processes to finish python多重处理。 子流程都在完成任务之前停止 - python multiprocessing. The subprocesses all stop before their tasks are done Python Multiprocessing JoinableQueue:清除队列并丢弃所有未完成的任务 - Python Multiprocessing JoinableQueue: clear queue and discard all unfinished tasks python 并行处理在一个内核上运行所有任务 - 多处理,射线 - python parallel processing running all tasks on one core - multiprocessing, ray
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM