多处理池 - 大多数工作人员已加载但仍处于空闲状态

Question

In a python 2.7 script, a first multiprocessing code to process a big chunk a numpy array.在 python 2.7 脚本中，第一个用于处理numpy数组的大块的多处理代码。 This is basically projection ray frameblock between an image plan and a Cartesian (world) plane.这基本上是图像平面和笛卡尔（世界）平面之间的投影光线帧块。 That part, called poo1 , works fine.那部分，称为poo1 ，工作正常。

Further in the script, I attempt to reproduce the multiprocessing code to project a lot of images with this projection ray frameblock.在脚本中，我尝试重现多处理代码以使用此投影光线帧块投影大量图像。

It seems that only 4 to 6 workers working but all of them is ready to work filling with data.似乎只有 4 到 6 名工作人员在工作，但他们都准备好填写数据了。 The pool2 creates workers, they are slow growing in memory usage, only up to 6 of them are using CPU power. pool2创建工人，他们在 memory 使用率增长缓慢，其中只有多达 6 个正在使用 CPU 能力。

Screenshot :截图：

Notes :备注：

There is not output return to get from the multi-processed function, the output if a file writing in a folder;没有 output 返回从多处理 function 得到，output 如果一个文件写入一个文件夹；
No worry about memory size issues, half TB available;不用担心 memory 大小问题，半 TB 可用；
No worry about the order of the process;不用担心流程的顺序；
Number of workers is physical CPU core - 1 = 27;工人数是物理 CPU 核心 - 1 = 27；
The length of the list of jobs to distribute (paramsGeoRef) can be from 1 to 250 rows.要分发的作业列表 (paramsGeoRef) 的长度可以是 1 到 250 行。

Arguments info : Arguments 信息：

Frameclock, massive ndarray, can be GB Frameclock，海量ndarray，可以是GB
A1: ndarray, can be hundrens of MB A1：ndarray，可以是几百MB
A2: ndarray, can be hundrens of MB A2：ndarray，可以是几百MB
B1: integer value B1：integer值
B2: integer value B2：integer 值
fileName: string, name文件名：字符串，名称
D1: string, path D1：字符串，路径
D2: string, path D2：字符串，路径
D3: string, path D3：字符串，路径
P1: small array P1：小阵列
P2: small array P2：小阵列

A simplification of the code look like this :代码的简化如下所示：

    def georef(paramsGeoRef):

        #Pseudo workflow
        """
        - unpack arguments, Frameclock, A1,A2, B1, B2, fileName, D1, D2, D3, P1, P2 <== paramsGeoRef
        - Loading tif image
        - Evergy convertion
            with function and P1, P2
        - Proportional projection of the image
            - Frameclock, A1, A2
        - Evergy convertion
            with function and P1, P2
        - Figure creation
        - Geotiff creation
        - export into file figure, geotiff and numpy file
        """
        return None

if __name__ == '__main__':

    paramsGeoRef = []
    for im in imgfiles:
        paramsGeoRef.append([Frameclock, A1, A2, B1, B2, fileName, D1 , D2 , D3 , P1 , P2])
    if flag_parallel:
        cpus = multiprocessing.cpu_count()
        cpus = cpus - 1
        pool2 = multiprocessing.Pool(processes=cpus)
        pool2.map(georef, paramsGeoRef)
        pool2.close()
        pool2.join()

I tried different approaches, such as :我尝试了不同的方法，例如：

Unpack arguements before:解包之前的争论：

def star_georef(Frameclock, A1,A2, B1, B2, fileName, D1, D2, D3, P1, P2):
    return georef(*paramsGeoRef)

def georef(paramsGeoRef):
    #Pseudo workflow...
    return None

Used another map type:使用另一个 map 类型：

pool2.imap_unordered()

What wrong?怎么了？ Why this method work for crunching numpy array, but not for this purpose?为什么此方法适用于处理numpy阵列，但不适用于此目的？ Need to handle a chunksize?需要处理块大小？

Maybe, I might need to feed workers as soon as they are available with a job generator?也许，我可能需要尽快为工人提供工作生成器？

Answer 1

Following Martineau advice,听从马蒂诺的建议，

I save the Frameclock, A1 and A2 arguements with with numpy in.npy format.我使用 numpy in.npy 格式保存 Frameclock、A1 和 A2 参数。 Then I load the.npy inside the parallelized.然后我在并行化中加载.npy。

such as:如：

def georef(paramsGeoRef):

    #Pseudo workflow
    """
    - unpack arguments, Frameblock, A1,A2, B1, B2, fileName, D1, D2, D3, P1, P2 <== paramsGeoRef
    - load Frameblock from his .npy
    - load A1 from his .npy
    - load A2 from his .npy
    - Loading tif image
    - Evergy convertion
        with function and P1, P2
    - Proportional projection of the image
        - Frameclock, A1, A2
    - Evergy convertion
        with function and P1, P2
    - Figure creation
    - Geotiff creation
    - export into file figure, geotiff and numpy file
    """
    return None

Even with saving and loading these is a drastic efficiency gain.即使保存和加载这些也是极大的效率增益。 All worker works.所有工人工作。

多处理池 - 大多数工作人员已加载但仍处于空闲状态

问题描述

1 个解决方案

解决方案1
0 2019-10-09 19:32:26

多处理池 - 大多数工作人员已加载但仍处于空闲状态

问题描述

1 个解决方案

解决方案1 0 2019-10-09 19:32:26

解决方案1
0 2019-10-09 19:32:26