简体   繁体   English

如何利用python多处理来利用所有核心

[英]How to utilize all cores with python multiprocessing

have been fiddling with Python's multicore function for upwards of an hour now, trying to parallelize a rather complex graph traversal function using Process and Manager : 一直在使用Python的multicore功能花了一个多小时,试图使用ProcessManager来并行化一个相当复杂的图遍历功能:

import networkx as nx
import csv
import time 
from operator import itemgetter
import os
import multiprocessing as mp

cutoff = 1

exclusionlist = ["cpd:C00024"]

DG = nx.read_gml("KeggComplete.gml", relabel = True)

for exclusion in exclusionlist:
    DG.remove_node(exclusion)

#checks if 'memorizedPaths exists, and if not, creates it
fn = os.path.join(os.path.dirname(__file__), 'memorizedPaths' + str(cutoff+1))
if not os.path.exists(fn):
    os.makedirs(fn)

manager = mp.Manager()
memorizedPaths = manager.dict()
filepaths = manager.dict()
degreelist = sorted(DG.degree_iter(),key=itemgetter(1),reverse=True)

def _all_simple_paths_graph(item, DG, cutoff, memorizedPaths, filepaths):
    source = item[0]
    uniqueTreePaths = []
    if cutoff < 1:
        return
    visited = [source]
    stack = [iter(DG[source])]
    while stack:
        children = stack[-1]
        child = next(children, None)
        if child is None:
            stack.pop()
            visited.pop()
        elif child in memorizedPaths:
            for path in memorizedPaths[child]:
                newPath = (tuple(visited) + tuple(path))
                if (len(newPath) <= cutoff) and (len(set(visited) & set(path)) == 0):
                    uniqueTreePaths.append(newPath)
            continue
        elif len(visited) < cutoff:
            if child not in visited:
                visited.append(child)
                stack.append(iter(DG[child]))
                if visited not in uniqueTreePaths:
                    uniqueTreePaths.append(tuple(visited))
        else: #len(visited) == cutoff:
            if (visited not in uniqueTreePaths) and (child not in visited):
                uniqueTreePaths.append(tuple(visited + [child]))
            stack.pop()
            visited.pop()
    #writes the absolute path of the node path file into the hash table
    filepaths[source] = str(fn) + "/" + str(source) +"path.txt"
    with open (filepaths[source], "wb") as csvfile2:
        writer = csv.writer(csvfile2, delimiter=' ', quotechar='|')
        for path in uniqueTreePaths:
            writer.writerow(path)
    memorizedPaths[source] = uniqueTreePaths

############################################################################

start = time.clock()
if __name__ == '__main__':
    for item in degreelist:
        test = mp.Process(target=_all_simple_paths_graph, args=(DG, cutoff, item, memorizedPaths, filepaths))
        test.start()
        test.join()
end = time.clock()
print (end-start)

Currently - though luck and magic - it works (sort of). 目前 - 虽然运气和魔术 - 它有效(有点)。 My problem is I'm only using 12 of my 24 cores. 我的问题是我只使用了24个内核中的12个。

Can someone explain why this might be the case? 有人可以解释为什么会出现这种情况吗? Perhaps my code isn't the best multiprocessing solution, or is it a feature of my architecture [Intel Xeon CPU E5-2640 @ 2.50GHz x18 running on Ubuntu 13.04 x64]? 也许我的代码不是最好的多处理解决方案,或者它是我的架构的一个特性[Intel Xeon CPU E5-2640 @ 2.50GHz x18在Ubuntu 13.04 x64上运行]?

EDIT: 编辑:

I managed to get: 我设法得到:

p = mp.Pool()
for item in degreelist:
    p.apply_async(_all_simple_paths_graph, args=(DG, cutoff, item, memorizedPaths, filepaths))
p.close()
p.join()

Working, however, it's VERY SLOW ! 然而,工作非常慢 So I assume I'm using the wrong function for the job. 所以我假设我正在使用错误的功能。 hopefully it helps clarify exactly what I'm trying to accomplish! 希望它有助于澄清我正在努力实现的目标!

EDIT2: .map attempt: EDIT2: .map尝试:

partialfunc = partial(_all_simple_paths_graph, DG=DG, cutoff=cutoff, memorizedPaths=memorizedPaths, filepaths=filepaths)
p = mp.Pool()
for item in processList:
    processVar = p.map(partialfunc, xrange(len(processList)))   
p.close()
p.join()

Works, is slower than singlecore. 作品,比单曲慢。 Time to optimize! 时间优化!

Too much piling up here to address in comments, so, where mp is multiprocessing : 太多堆积在这里以便在评论中解决,因此, mpmultiprocessing

mp.cpu_count() should return the number of processors. mp.cpu_count()应该返回处理器的数量。 But test it. 但要测试一下。 Some platforms are funky, and this info isn't always easy to get. 有些平台很时髦,而且这些信息并不容易获得。 Python does the best it can. Python尽其所能。

If you start 24 processes, they'll do exactly what you tell them to do ;-) Looks like mp.Pool() would be most convenient for you. 如果你开始24个进程,他们会完全按照你告诉他们做的;-)看起来像mp.Pool()对你来说最方便。 You pass the number of processes you want to create to its constructor. 您将要创建的进程数传递给其构造函数。 mp.Pool(processes=None) will use mp.cpu_count() for the number of processors. mp.Pool(processes=None)将使用mp.cpu_count()作为处理器数。

Then you can use, for example, .imap_unordered(...) on your Pool instance to spread your degreelist across processes. 然后,您可以在Pool实例上使用.imap_unordered(...)来跨进程传播您的degreelist Or maybe some other Pool method would work better for you - experiment. 或者也许其他一些Pool方法对你更有效 - 实验。

If you can't bash the problem into Pool 's view of the world, you could instead create an mp.Queue to create a work queue, .put() 'ing nodes (or slices of nodes, to reduce overhead) to work on in the main program, and write the workers to .get() work items off that queue. 如果你不能把问题mp.Queue Pool的世界观中,你可以创建一个mp.Queue来创建一个工作队列, .put()节点(或节点片,以减少开销)工作在主程序中,将worker写入该队列中的.get()工作项。 Ask if you need examples. 询问您是否需要示例。 Note that you need to put sentinel values (one per process) on the queue, after all the "real" work items, so that worker processes can test for the sentinel to know when they're done. 请注意,您需要在所有“真实”工作项之后将队列值(每个进程一个)放在队列中,以便工作进程可以测试标记以了解它们何时完成。

FYI, I like queues because they're more explicit. 仅供参考,我喜欢排队因为他们更明确。 Many others like Pool s better because they're more magical ;-) 许多其他人喜欢Pool更好,因为他们更神奇;-)

Pool Example 池示例

Here's an executable prototype for you. 这是一个可执行的原型。 This shows one way to use imap_unordered with Pool and chunksize that doesn't require changing any function signatures. 这显示了使用带有Poolchunksize imap_unordered一种方法,它不需要更改任何函数签名。 Of course you'll have to plug in your real code ;-) Note that the init_worker approach allows passing "most of" the arguments only once per processor, not once for every item in your degreeslist . 当然你必须插入你真正的代码;-)请注意, init_worker方法允许每个处理器只传递一次“大部分”参数,而不是每个项目中的每个项目degreeslist Cutting the amount of inter-process communication can be crucial for speed. 减少进程间通信量对于速度至关重要。

import multiprocessing as mp

def init_worker(mps, fps, cut):
    global memorizedPaths, filepaths, cutoff
    global DG

    print "process initializing", mp.current_process()
    memorizedPaths, filepaths, cutoff = mps, fps, cut
    DG = 1##nx.read_gml("KeggComplete.gml", relabel = True)

def work(item):
    _all_simple_paths_graph(DG, cutoff, item, memorizedPaths, filepaths)

def _all_simple_paths_graph(DG, cutoff, item, memorizedPaths, filepaths):
    pass # print "doing " + str(item)

if __name__ == "__main__":
    m = mp.Manager()
    memorizedPaths = m.dict()
    filepaths = m.dict()
    cutoff = 1 ##
    # use all available CPUs
    p = mp.Pool(initializer=init_worker, initargs=(memorizedPaths,
                                                   filepaths,
                                                   cutoff))
    degreelist = range(100000) ##
    for _ in p.imap_unordered(work, degreelist, chunksize=500):
        pass
    p.close()
    p.join()

I strongly advise running this exactly as-is, so you can see that it's blazing fast. 我强烈建议完全按原样运行,这样你就可以看到它的速度非常快。 Then add things to it a bit a time, to see how that affects the time. 然后稍微添加一些东西,看看它会如何影响时间。 For example, just adding 例如,只需添加

   memorizedPaths[item] = item

to _all_simple_paths_graph() slows it down enormously. _all_simple_paths_graph()会大大减慢它的速度。 Why? 为什么? Because the dict gets bigger and bigger with each addition, and this process-safe dict has to be synchronized (under the covers) among all the processes. 因为每次添加时dict变得越来越大,并且这个过程安全的dict必须在所有进程之间同步(在封面下)。 The unit of synchronization is "the entire dict" - there's no internal structure the mp machinery can exploit to do incremental updates to the shared dict. 同步单元是“整个字典” - 没有内部结构,mp机器可以利用它来对共享字典进行增量更新。

If you can't afford this expense, then you can't use a Manager.dict() for this. 如果您负担不起这笔费用,那么就不能使用Manager.dict() Opportunities for cleverness abound ;-) 聪明的机会比比皆是;-)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM