简体   繁体   English

剖析python多处理池

[英]profiling python multiprocessing pool

I've got some code that is using a Pool from Python's multiprocessing module. 我有一些使用Python multiprocessing模块中的Pool代码。 Performance isn't what I expect and wanted to profile the code to figure out what's happening. 性能不是我所期望的,我想对代码进行分析以了解正在发生的事情。 The problem I'm having is that the profiling output gets overwritten for each job and I can't accumulate a sensible amount of stats. 我遇到的问题是,每个工作的分析输出都会被覆盖,并且我无法累积合理数量的统计数据。

For example, with: 例如,使用:

import multiprocessing as mp
import cProfile
import time
import random

def work(i):
    x = random.random()
    time.sleep(x)
    return (i,x)

def work_(args):
    out = [None]
    cProfile.runctx('out[0] = work(args)', globals(), locals(),
                    'profile-%s.out' % mp.current_process().name)
    return out[0]

pool = mp.Pool(10)

for i in pool.imap_unordered(work_, range(100)):
    print(i)

I only get stats on the "last" job, which may not be the most computationally demanding one. 我只获得“最后一个”工作的统计信息,这可能不是对计算要求最高的工作。 I presume I need to store the stats somewhere and then only write them out when the pool is being cleaned up. 我想我需要将统计信息存储在某个地方,然后仅在清理池时将其写出。

My solution involves holding onto a profile object for longer and only writing it out at the "end". 我的解决方案涉及保持概要文件对象更长的时间,并且只在“末尾”将其写出。 Hooking into the Pool teardown is described better elsewhere , but involves using a Finalize object to execute dump_stats() explicitly at the appropriate time. 挂钩Pool拆卸在其他地方更好的描述,但涉及使用Finalize对象在适当的时间显式执行dump_stats()

This also allows me to tidy up the awkward work_ trampoline needed with the runctx I was using before. 这也使我可以整理以前使用过的runctx所需的笨拙的work_蹦床。

import multiprocessing as mp
import cProfile
import time
import random

def work(i):
    # enable profiling (refers to the global object below)
    prof.enable()
    x = random.random()
    time.sleep(x)
    # disable so we don't profile the Pool
    prof.disable()
    return (i,x)

# Initialise a good profile object and make sure it gets written during Pool teardown
def _poolinit():
    global prof
    prof = cProfile.Profile()
    def fin():
        prof.dump_stats('profile-%s.out' % mp.current_process().pid)

    mp.util.Finalize(None, fin, exitpriority=1)

# create our pool
pool = mp.Pool(10, _poolinit)

for i in pool.imap_unordered(work, range(100)):
    print(i)

Loading the output shows that multiple invocations were indeed recorded: 加载输出表明确实记录了多个调用:

> p = pstats.Stats("profile-ForkPoolWorker-5.out")
> p.sort_stats("time").print_stats(10)
Fri Sep 11 12:11:58 2015    profile-ForkPoolWorker-5.out

         30 function calls in 4.684 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       10    4.684    0.468    4.684    0.468 {built-in method sleep}
       10    0.000    0.000    0.000    0.000 {method 'random' of '_random.Random' objects}
       10    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

according to the docs you can use the pid attribute to get a unique name for each output file 根据文档,您可以使用pid属性为每个输出文件获取唯一的名称

cProfile.runctx('out[0] = work(args)', globals(), locals(),
                'profile-%s-%s.out' % (mp.current_process().pid, datetime.now().isoformat()))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM