剖析python多处理池

Question

I've got some code that is using a Pool from Python's multiprocessing module. 我有一些使用Python multiprocessing模块中的Pool代码。 Performance isn't what I expect and wanted to profile the code to figure out what's happening. 性能不是我所期望的，我想对代码进行分析以了解正在发生的事情。 The problem I'm having is that the profiling output gets overwritten for each job and I can't accumulate a sensible amount of stats. 我遇到的问题是，每个工作的分析输出都会被覆盖，并且我无法累积合理数量的统计数据。

For example, with: 例如，使用：

import multiprocessing as mp
import cProfile
import time
import random

def work(i):
    x = random.random()
    time.sleep(x)
    return (i,x)

def work_(args):
    out = [None]
    cProfile.runctx('out[0] = work(args)', globals(), locals(),
                    'profile-%s.out' % mp.current_process().name)
    return out[0]

pool = mp.Pool(10)

for i in pool.imap_unordered(work_, range(100)):
    print(i)

I only get stats on the "last" job, which may not be the most computationally demanding one. 我只获得“最后一个”工作的统计信息，这可能不是对计算要求最高的工作。 I presume I need to store the stats somewhere and then only write them out when the pool is being cleaned up. 我想我需要将统计信息存储在某个地方，然后仅在清理池时将其写出。

Answer 1

My solution involves holding onto a profile object for longer and only writing it out at the "end". 我的解决方案涉及保持概要文件对象更长的时间，并且只在“末尾”将其写出。 Hooking into the Pool teardown is described better elsewhere , but involves using a Finalize object to execute dump_stats() explicitly at the appropriate time. 挂钩Pool拆卸在其他地方有更好的描述，但涉及使用Finalize对象在适当的时间显式执行dump_stats() 。

This also allows me to tidy up the awkward work_ trampoline needed with the runctx I was using before. 这也使我可以整理以前使用过的runctx所需的笨拙的work_蹦床。

import multiprocessing as mp
import cProfile
import time
import random

def work(i):
    # enable profiling (refers to the global object below)
    prof.enable()
    x = random.random()
    time.sleep(x)
    # disable so we don't profile the Pool
    prof.disable()
    return (i,x)

# Initialise a good profile object and make sure it gets written during Pool teardown
def _poolinit():
    global prof
    prof = cProfile.Profile()
    def fin():
        prof.dump_stats('profile-%s.out' % mp.current_process().pid)

    mp.util.Finalize(None, fin, exitpriority=1)

# create our pool
pool = mp.Pool(10, _poolinit)

for i in pool.imap_unordered(work, range(100)):
    print(i)

Loading the output shows that multiple invocations were indeed recorded: 加载输出表明确实记录了多个调用：

> p = pstats.Stats("profile-ForkPoolWorker-5.out")
> p.sort_stats("time").print_stats(10)
Fri Sep 11 12:11:58 2015    profile-ForkPoolWorker-5.out

         30 function calls in 4.684 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       10    4.684    0.468    4.684    0.468 {built-in method sleep}
       10    0.000    0.000    0.000    0.000 {method 'random' of '_random.Random' objects}
       10    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Answer 2

according to the docs you can use the pid attribute to get a unique name for each output file 根据文档，您可以使用pid属性为每个输出文件获取唯一的名称

cProfile.runctx('out[0] = work(args)', globals(), locals(),
                'profile-%s-%s.out' % (mp.current_process().pid, datetime.now().isoformat()))

剖析python多处理池

问题描述

2 个解决方案

解决方案1
1 2015-09-11 11:37:45

解决方案2
0 2015-09-11 11:09:41

剖析python多处理池

问题描述

2 个解决方案

解决方案1 1 2015-09-11 11:37:45

解决方案2 0 2015-09-11 11:09:41

解决方案1
1 2015-09-11 11:37:45

解决方案2
0 2015-09-11 11:09:41