将 100% 的内核用于多处理模块

Question

I have two pieces of code that I'm using to learn about multiprocessing in Python 3.1.我有两段代码用于学习 Python 3.1 中的多处理。 My goal is to use 100% of all the available processors.我的目标是使用 100% 的所有可用处理器。 However, the code snippets here only reach 30% - 50% on all processors.但是，此处的代码片段在所有处理器上仅达到 30% - 50%。

Is there anyway to 'force' python to use all 100%?无论如何要“强制”python 使用所有 100%？ Is the OS (windows 7, 64bit) limiting Python's access to the processors?操作系统（Windows 7、64 位）是否限制了 Python 对处理器的访问？ While the code snippets below are running, I open the task manager and watch the processor's spike, but never reach and maintain 100%.当下面的代码片段正在运行时，我打开任务管理器并观察处理器的峰值，但从未达到并保持 100%。 In addition to that, I can see multiple python.exe processes created and destroyed along the way.除此之外，我还可以看到在此过程中创建和销毁了多个 python.exe 进程。 How do these processes relate to processors?这些过程与处理器有什么关系？ For example, if I spawn 4 processes, each process isn't using it's own core.例如，如果我生成 4 个进程，则每个进程都没有使用它自己的核心。 Instead, what are the processes using?相反，进程使用的是什么？ Are they sharing all cores?他们是否共享所有核心？ And if so, is it the OS that is forcing the processes to share the cores?如果是这样，是否是操作系统强制进程共享内核？

code snippet 1代码片段 1

import multiprocessing

def worker():
    #worker function
    print ('Worker')
    x = 0
    while x < 1000:
        print(x)
        x += 1
    return

if __name__ == '__main__':
    jobs = []
    for i in range(50):
        p = multiprocessing.Process(target=worker)
        jobs.append(p)
        p.start()

code snippet 2代码片段 2

from multiprocessing import Process, Lock

def f(l, i):
    l.acquire()
    print('worker ', i)
    x = 0
    while x < 1000:
        print(x)
        x += 1
    l.release()

if __name__ == '__main__': 
    lock = Lock()
    for num in range(50):
        Process(target=f, args=(lock, num)).start()

Answer 1

To use 100% of all cores, do not create and destroy new processes.要 100% 使用所有内核，请不要创建和销毁新进程。

Create a few processes per core and link them with a pipeline.每个核心创建几个进程并将它们与管道链接。

At the OS-level, all pipelined processes run concurrently.在操作系统级别，所有流水线进程同时运行。

The less you write (and the more you delegate to the OS) the more likely you are to use as many resources as possible.你写的越少（你委托给操作系统的越多），你就越有可能使用尽可能多的资源。

python p1.py | python p2.py | python p3.py | python p4.py ...

Will make maximal use of your CPU.将最大限度地利用您的 CPU。

Answer 2

You can use psutil to pin each process spawned by multiprocessing to a specific CPU:您可以使用psutil将multiprocessing产生的每个进程固定到特定 CPU：

import multiprocessing as mp
import psutil


def spawn():
    procs = list()
    n_cpus = psutil.cpu_count()
    for cpu in range(n_cpus):
        affinity = [cpu]
        d = dict(affinity=affinity)
        p = mp.Process(target=run_child, kwargs=d)
        p.start()
        procs.append(p)
    for p in procs:
        p.join()
        print('joined')

def run_child(affinity):
    proc = psutil.Process()  # get self pid
    print('PID: {pid}'.format(pid=proc.pid))
    aff = proc.cpu_affinity()
    print('Affinity before: {aff}'.format(aff=aff))
    proc.cpu_affinity(affinity)
    aff = proc.cpu_affinity()
    print('Affinity after: {aff}'.format(aff=aff))


if __name__ == '__main__':
    spawn()

Note: As commented, psutil.Process.cpu_affinity is not available on macOS.注意：正如所评论的， psutil.Process.cpu_affinity在 macOS 上不可用。

Answer 3

Minimum example in pure Python:纯 Python 中的最小示例：

def f(x):
    while 1:
        # ---bonus: gradually use up RAM---
        x += 10000  # linear growth; use exponential for faster ending: x *= 1.01
        y = list(range(int(x))) 
        # ---------------------------------
        pass  # infinite loop, use up CPU

if __name__ == '__main__':  # name guard to avoid recursive fork on Windows
    import multiprocessing as mp
    n = mp.cpu_count() * 32  # multiply guard against counting only active cores
    with mp.Pool(n) as p:
        p.map(f, range(n))

Usage: to warm up on a cold day (but feel free to change the loop to something less pointless.)用法：在寒冷的一天热身（但可以随意将循环更改为不那么毫无意义的东西。）

Warning: to exit, don't pull the plug or hold the power button, Ctrl-C instead.警告：要退出，请不要拔插头或按住电源按钮，而是按 Ctrl-C。

Answer 4

Regarding code snippet 1: How many cores / processors do you have on your test machine?关于代码片段 1：您的测试机器上有多少个内核/处理器？ It isn't doing you any good to run 50 of these processes if you only have 2 CPU cores.如果您只有 2 个 CPU 内核，那么运行 50 个这样的进程对您没有任何好处。 In fact you're forcing the OS to spend more time context switching to move processes on and off the CPU than do actual work.事实上，与实际工作相比，您正在迫使操作系统花费更多的时间上下文切换来将进程移入和移出 CPU。

Try reducing the number of spawned processes to the number of cores.尝试将生成的进程数减少到内核数。 So "for i in range(50):" should become something like:所以“for i in range(50):”应该变成这样：

import os;
# assuming you're on windows:
for i in range(int(os.environ["NUMBER_OF_PROCESSORS"])):
    ...

Regarding code snippet 2: You're using a multiprocessing.Lock which can only be held by a single process at a time so you're completely limiting all the parallelism in this version of the program.关于代码片段 2：您正在使用 multiprocessing.Lock，它一次只能由一个进程持有，因此您完全限制了此版本程序中的所有并行性。 You've serialized things so that process 1 through 50 start, a random process (say process 7) acquires the lock.您已将事物序列化，以便进程 1 到 50 启动，随机进程（例如进程 7）获取锁。 Processes 1-6, and 8-50 all sit on the line:进程1-6、8-50都在排队：

l.acquire()

While they sit there they are just waiting for the lock to be released.当他们坐在那里时，他们只是在等待锁被释放。 Depending on the implementation of the Lock primitive they are probably not using any CPU, they're just sitting there using system resources like RAM but are doing no useful work with the CPU.根据 Lock 原语的实现，他们可能不使用任何 CPU，他们只是坐在那里使用系统资源，如 RAM，但不使用 CPU 做任何有用的工作。 Process 7 counts and prints to 1000 and then releases the lock.进程 7 计数并打印到 1000，然后释放锁。 The OS then is free to schedule randomly one of the remaining 49 processes to run.然后操作系统可以自由地随机安排剩余的 49 个进程中的一个运行。 Whichever one it wakes up first will acquire the lock next and run while the remaining 48 wait on the Lock.无论哪个先唤醒，接下来都会获取锁并运行，而其余 48 个则等待锁。 This'll continue for the whole program.这将在整个程序中继续。

Basically, code snippet 2 is an example of what makes concurrency hard.基本上，代码片段 2 是使并发变得困难的一个例子。 You have to manage access by lots of processes or threads to some shared resource.您必须管理许多进程或线程对某些共享资源的访问。 In this particular case there really is no reason that these processes need to wait on each other though.在这种特殊情况下，这些进程确实没有理由需要相互等待。

So of these two, Snippet 1 is closer to more efficiently utilitizing the CPU.因此，在这两者中，代码段 1 更接近于更有效地利用 CPU。 I think properly tuning the number of processes to match the number of cores will yield a much improved result.我认为适当调整进程数以匹配内核数会产生很大的改进结果。

Answer 5

I'd recommend using the Joblib library, it's a good library for multiprocessing, used in many ML applications, in sklearn etc.我建议使用Joblib库，它是一个很好的多处理库，用于许多 ML 应用程序、sklearn 等。

from joblib import Parallel, delayed

Parallel(n_jobs=-1, prefer="processes", verbose=6)(
    delayed(function_name)(parameter1, parameter2, ...)
    for parameter1, parameter2, ... in object
)

Where n_jobs is the number of concurrent jobs.其中n_jobs是并发作业的数量。 Set n=-1 if you want to use all available cores on the machine that you're running your code.如果要使用运行代码的机器上的所有可用内核，请设置n=-1 。

More details on parameters here: https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html有关此处参数的更多详细信息： https : //joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html

In your case, a possible implementation would be:在您的情况下，可能的实现是：

def worker(i):
    print('worker ', i)
    x = 0
    while x < 1000:
        print(x)
        x += 1

Parallel(n_jobs=-1, prefer="processes", verbose=6)(
        delayed(worker)(num)
        for num in range(50)
    )

Answer 6

To answer your question(s):回答您的问题：

Is there anyway to 'force' python to use all 100%?无论如何要“强制”python 使用所有 100%？

Not that I've heard of不是我听说过

Is the OS (windows 7, 64bit) limiting Python's access to the processors?操作系统（Windows 7、64 位）是否限制了 Python 对处理器的访问？

Yes and No, Yes: if it python took 100%, windows will freeze.是和否，是的：如果 python 占用 100%，Windows 将冻结。 No, you can grant python Admin Priviledges which will result in a lockup.不，您可以授予 python 管理员权限，这将导致锁定。

How do these processes relate to processors?这些过程与处理器有什么关系？

They don't, technically on the OS level those python "processes" are threads which is processed by the OS Handler as it needs handling.他们没有，从技术上讲，在操作系统级别，那些 python“进程”是由操作系统处理程序在需要处理时处理的线程。

Instead, what are the processes using?相反，进程使用的是什么？ Are they sharing all cores?他们是否共享所有核心？ And if so, is it the OS that is forcing the processes to share the cores?如果是这样，是否是操作系统强制进程共享内核？

They are sharing all cores, unless you start a single python instance that has affinity set to a certain core (in a multicore system) your processes will be split into which-ever-core-is-free processing.它们共享所有核心，除非您启动一个将亲缘性设置为某个核心（在多核系统中）的单个 python 实例，否则您的进程将被拆分为无核的处理。 So yes, the OS is forcing the core sharing by default (or python is technically)所以是的，操作系统在默认情况下强制核心共享（或者 python 是技术上的）

if you are interested in python core affinity, check out the affinity package for python .如果您对 python 核心亲和性感兴趣，请查看python的亲和性包。

将 100% 的内核用于多处理模块

问题描述

code snippet 1代码片段 1

code snippet 2代码片段 2

6 个解决方案

解决方案1
46 已采纳 2011-04-25 23:33:30

解决方案2
18 2016-02-12 20:23:29

解决方案3
8 2018-09-05 01:58:01

解决方案4
6 2011-04-26 04:05:49

解决方案5
1 2019-12-16 14:24:30

解决方案6
-12 2011-04-25 23:43:48

将 100% 的内核用于多处理模块

问题描述

code snippet 1代码片段 1

code snippet 2代码片段 2

6 个解决方案

解决方案1 46 已采纳 2011-04-25 23:33:30

解决方案2 18 2016-02-12 20:23:29

解决方案3 8 2018-09-05 01:58:01

解决方案4 6 2011-04-26 04:05:49

解决方案5 1 2019-12-16 14:24:30

解决方案6 -12 2011-04-25 23:43:48

解决方案1
46 已采纳 2011-04-25 23:33:30

解决方案2
18 2016-02-12 20:23:29

解决方案3
8 2018-09-05 01:58:01

解决方案4
6 2011-04-26 04:05:49

解决方案5
1 2019-12-16 14:24:30

解决方案6
-12 2011-04-25 23:43:48