简体   繁体   English

使用带有列表的pool.map进行Python多处理

[英]Python multiprocessing using pool.map with list

I am working on python code using multiprocessing. 我正在使用多处理技术处理python代码。 Below is the code 下面是代码

import multiprocessing
import os

def square(n):
    #logger.info("Worker process id for {0}: {1}".format(n, os.getpid()))
    logger.info("Evaluating square of the number {0}".format(n))
    print('process id of {0}: {1}'.format(n,os.getpid()))
    return (n * n)

if __name__ == "__main__":
    # input list
    mylist = [1, 2, 3, 4, 5,6,7,8,9,10]

    # creating a pool object
    p = multiprocessing.Pool(4)

    # map list to target function
    result = p.map(square, mylist)

    print(result)

The number of CPU cores in my server is 4. If I use 4 only single processes is initiated. 服务器中的CPU核心数为4。如果使用4,则仅启动单个进程。 In general, it should start 4 separate processes right?. 通常,它应该启动4个单独的进程,对吗?

If I set the value to 8 in the Pool object below is the response I got 如果我在下面的Pool对象中将该值设置为8,则得到的响应

process id of 1: 25872

process id of 2: 8132

process id of 3: 1672

process id of 4: 27000

process id of 6: 25872

process id of 5: 20964

process id of 9: 25872

process id of 8: 1672

process id of 7: 8132

process id of 10: 27000

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

This started 5 separate processes(25872,8132,1672,27000,20964) even though there are only 4 cpu cores. 即使只有4个CPU内核,这也开始了5个单独的进程(25872,8132,1672,27000,20964)。

  1. I don't understand why the pool initiated only 1 process when the value is 4 and initiated 5 separate processes when the value is 8. 我不明白为什么池在值为4时只启动1个进程,而在值为8时又启动5个单独的进程。

  2. Can pool object be instantiated with a value greater than the number of CPU cores? 可以使用大于CPU内核数的值实例化池对象吗?

  3. Also what should be the optimal value we should use while instantiating pool object if a list contains a million records? 如果列表包含一百万条记录,我们在实例化池对象时应该使用的最佳值应该是多少?

I have been through official python documentation, but I couldn't find info. 我已经看过官方的python文档,但是找不到信息。 Please help 请帮忙

Let's answer one by one. 让我们一个接一个地回答。

  1. I don't understand why the pool initiated only 1 process when the value is 4 and initiated 5 separate processes when the value is 8. 我不明白为什么池在值为4时只启动1个进程,而在值为8时又启动5个单独的进程。

The pool initiated 4 processes. 池启动了4个进程。 Do not mistake the number of cores you have for the number of processes, is totally independent. 不要将您拥有的核心数量与进程数量完全相同。 You have 5 processes because the initial python one also counts. 您有5个进程,因为最初的python也很重要。 So, you started with the main python processes, which call the pool to start another 4 ones, that makes 5 of them. 因此,您从主要的 python进程开始,这些进程调用池以启动另外4个进程,从而使其中的5个进程生效。 In the case that you see that only a few of the processes are being used, it means that probably they are capable of killing the task fast enough so the other processes are not needed. 如果您看到只使用了少数几个进程,则意味着它们可能能够足够快地终止任务,因此不需要其他进程。

  1. Can pool object be instantiated with a value greater than the number of CPU cores? 可以使用大于CPU内核数的值实例化池对象吗?

Yes indeed, you can instantiate any number you want (although there may be some kind of limit depending on the OS). 是的,确实可以,您可以实例化任何所需的数字(尽管根据操作系统可能会有某种限制)。 But notice that this will just make your CPU to be overloaded. 但是请注意,这只会使您的CPU过载。 More explanation below. 下面有更多解释。

  1. Also what should be the optimal value we should use while instantiating pool object if a list contains a million records? 如果列表包含一百万条记录,我们在实例化池对象时应该使用的最佳值应该是多少?

Well, usually the "optimal" would be that all the cores of your CPU are fully in usage by your pool. 好吧,通常,“最佳”是您的池充分利用了CPU的所有内核。 So, if you have 4 cores , 4 processes would be the best option, although sometimes this is not exactly like that it is a good starting approximation. 因此,如果您有4个核心 ,则4个进程将是最佳选择,尽管有时这并不完全是一个很好的起点。

One last note, 最后一点,

I have been through official python documentation, but I couldn't find info. 我已经看过官方的python文档,但是找不到信息。

This is not really python specific, it is general behavior in CS. 这不是真的特定于python,而是CS中的常规行为。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM