简体   繁体   English

多重处理池缩放

[英]multiprocessing.Pool Scaling

I am wondering why my CPU load is so low even though I do not get a high processing rate: 我想知道为什么即使我没有得到很高的处理速度,我的CPU负载还是这么低:

import time
from multiprocessing import Pool
import numpy as np
from skimage.transform import AffineTransform, SimilarityTransform, warp

center_shift = 256 / 2
tf_center = SimilarityTransform(translation=-center_shift)
tf_uncenter = SimilarityTransform(translation=center_shift)


def sample_gen_random_i():
    for i in range(10000000000000):
        x = np.random.rand(256, 256, 4)
        y = [0]

        yield x, y


def augment(sample):
    x, y = sample
    rotation = 2 * np.pi * np.random.random_sample()
    translation = 5 * np.random.random_sample(), 5 * np.random.random_sample()
    scale_factor = np.random.random_sample() * 0.2 + 0.9
    scale = scale_factor, scale_factor

    tf_augment = AffineTransform(scale=scale, rotation=rotation, translation=translation)
    tf = tf_center + tf_augment + tf_uncenter

    warped_x = warp(x, tf)

    return warped_x, y


def augment_parallel_sample_gen(samples):
    p = Pool(4)

    for sample in p.imap_unordered(augment, samples, chunksize=10):
        yield sample

    p.close()
    p.join()


def augment_sample_gen(samples):
    for sample in samples:
        yield augment(sample)



# This is slow and the single cpu core has 100% load
print('Single Thread --> Slow')
samples = sample_gen_random_i()
augmented = augment_sample_gen(samples)

start = time.time()
for i, sample in enumerate(augmented):
    print(str(i) + '|' + str(i / (time.time() - start))[:6] + ' samples / second', end='\r')
    if i >= 2000:
        print(str(i) + '|' + str(i / (time.time() - start))[:6] + ' samples / second')
        break

# This is slow and there is only light load on the cpu cores
print('Multithreaded --> Slow')
samples = sample_gen_random_i()
augmented = augment_parallel_sample_gen(samples)

start = time.time()
for i, sample in enumerate(augmented):
    print(str(i) + '|' + str(i / (time.time() - start))[:6] + ' samples / second', end='\r')
    if i >= 2000:
        print(str(i) + '|' + str(i / (time.time() - start))[:6] + ' samples / second')
        break

I am using multiprocessing.Pool's imap, but I think there is some overhead. 我正在使用multiprocessing.Pool的imap,但我认为这会产生一些开销。 I can reach about 500 samples/s when using no augmentation and no multiprocessing, 150 with augmentation no multiprocessing and like 170 with augmentation and multiprocessing so I suspect there must be something wrong with my approach. 在不使用增强和不进行多处理的情况下,我可以达到约500个样本/秒;在不进行增强,不进行多处理的情况下,可以达到150个样本/秒;就像在不进行增强和多处理的情况下,可以达到170个样本/秒,因此我怀疑我的方法一定存在问题。 The code should be executable and self explanatory! 该代码应该是可执行的并且可以自我解释! :) :)

The problem seems to be that with 问题似乎在于

return warped_x, y

passing the images to the processed and passing back the whole transformed image to the main process seems to be the bottleneck. 将图像传递给已处理的图像,然后将整个转换后的图像传递回主过程似乎是瓶颈。 If I only give back for example the first pixel 例如,如果我只还第一像素

return x[0, 0, 0], y

and move sample creation onto the child processes 并将样本创建移至子进程

def augment(y):
    x = np.random.rand(256, 256, 4)
    rotation = 2 * np.pi * np.random.random_sample()
    ...

the speed will scale up nearly linearly with the number of cores... 速度将随着核心数量线性增加...

Maybe threads will work better than processes (?) 也许线程会比进程更好(?)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM