简体   繁体   English

Python Multiprocessing pool.map对太多工作进程无响应

[英]Python Multiprocessing pool.map unresponsive with too many worker processes

first question on stack overflow so please bear with. 堆栈溢出的第一个问题,请耐心等待。 I am looking to calculate the variance for group ratings (long numpy arrays). 我正在寻找计算组评分(长numpy数组)的方差。 Running the program without parallel processing works fine, but given each process can run independently and there are 32 groups I am looking to make use of multiprocessing to speed things up. 在没有并行处理的情况下运行程序可以很好地工作,但是鉴于每个进程可以独立运行,因此我希望有32个组可以利用多处理来加快处理速度。 This works OK for small numbers of groups < 10, but after this the program will often just seemingly stop running with no error messages at an unspecified number of groups ( usually between 20 and 30 ) although less frequently will run all the way through. 对于少于10个的少量组,这可以正常工作,但是在此之后,程序似乎似乎会在没有指定数量的组(通常为20到30个)之间停止运行,并且没有错误消息,尽管会一直运行的频率较低。 The arrays are quite large ( 21451 x 11462 user item ratings) and so I am wondering if the problem is caused by not enough memory, although no error messages are printed. 数组很大(21451 x 11462用户项目评分),所以我想知道问题是否是由于没有足够的内存而引起的,尽管没有打印错误消息。

import numpy as np
from functools import partial
import multiprocessing

def variance_parallel(extra_matrices, group_num):
    # do some variation calculation
    # print confirmation that we have entered function, and group number
    return single_group_var

def variance(extra_matrices, num_groups):
    variance_partial = partial(variance_parallel, extra_matrices)
    for g in list(range(num_groups)):
        group_var = pool.map(variance_partial,range(g))
    return(group_var)     

num_cores = multiprocessing.cpu_count() - 1
pool = multiprocessing.Pool(processes=num_cores)
variance(extra_matrices, num_groups)

Running the above code shows the program progressively building the number of groups it is checking variance on ([0],[0,1],[0,1,2],...) before eventually printing nothing. 运行上面的代码显示该程序逐步构建正在检查方差的组数([0],[0,1],[0,1,2],...),然后最终什么也不打印。

Thanks in advance for any help and apologies if my formatting / question is a bit off! 如果我的格式/问题有点问题,请提前感谢您的帮助和歉意!

  • Multiple processes do not share data 多个进程不共享数据
  • Data sent to processes needs to be copied 发送到流程的数据需要复制

Since the arrays are large, the issue is very likely to do with said copying of large arrays to the processes. 由于阵列很大,因此问题很可能与将大型阵列复制到进程有关。 Further more in Python's multiprocessing, sending data to processes is done by serialisation which is (a) CPU intensive and (b) takes extra memory in and by it self. 此外,在Python的多处理过程中,将数据发送到进程是通过序列化完成的,序列化是(a)CPU密集型的,并且(b)自身需要额外的内存。

In short multi processing is not a good fit for your use case. 简而言之,多重处理并不适合您的用例。 Since numpy is a native code extension (where GIL does not apply) and is thread safe, best to use threading instead of multiprocessing. 由于numpy是本机代码扩展(不适用于GIL),并且是线程安全的,因此最好使用线程而不是多处理。 With threading, the worker threads can share data via their parent process's address space which makes away with having to copy. 使用线程,辅助线程可以通过其父进程的地址空间共享数据,从而不必进行复制。

That should stop the program from running out of memory. 那应该阻止程序耗尽内存。

However, for threads to share address space the data they share needs to be bound to an object, like in a python class. 但是,要使线程共享地址空间,它们共享的数据需要绑定到对象,例如python类。

Something like the below - untested as the code sample is incomplete. 如下所示-未经测试,因为代码示例不完整。

import numpy as np
from functools import partial
from threading import Thread
from multiprocessing import cpu_count

class Variance(Thread):

    def __init__(self, extra_matrices, group_num):
        Thread.__init__(self)
        self.extra_matrices = extra_matrices
        self.group_num = group_num
        self.output = None

    def run(self):
        # do some variation calculation
        # print confirmation that we have entered function, and group number
        self.output = single_group_var

num_cores = cpu_count() - 1
results = []
for g in list(range(num_groups)):
    workers = [Variance(extra_matrices, range(g)) 
               for _ in range(num_cores)]
    # Start threads
    for worker in workers:
        worker.start()
    # Wait for completion
    for worker in workers:
        worker.join()
    results.extend([w.output for w in workers])
print results

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM