[英]profiling python multiprocessing pool
I've got some code that is using a Pool
from Python's multiprocessing
module. 我有一些使用Python
multiprocessing
模块中的Pool
代码。 Performance isn't what I expect and wanted to profile the code to figure out what's happening. 性能不是我所期望的,我想对代码进行分析以了解正在发生的事情。 The problem I'm having is that the profiling output gets overwritten for each job and I can't accumulate a sensible amount of stats.
我遇到的问题是,每个工作的分析输出都会被覆盖,并且我无法累积合理数量的统计数据。
For example, with: 例如,使用:
import multiprocessing as mp
import cProfile
import time
import random
def work(i):
x = random.random()
time.sleep(x)
return (i,x)
def work_(args):
out = [None]
cProfile.runctx('out[0] = work(args)', globals(), locals(),
'profile-%s.out' % mp.current_process().name)
return out[0]
pool = mp.Pool(10)
for i in pool.imap_unordered(work_, range(100)):
print(i)
I only get stats on the "last" job, which may not be the most computationally demanding one. 我只获得“最后一个”工作的统计信息,这可能不是对计算要求最高的工作。 I presume I need to store the stats somewhere and then only write them out when the pool is being cleaned up.
我想我需要将统计信息存储在某个地方,然后仅在清理池时将其写出。
My solution involves holding onto a profile object for longer and only writing it out at the "end". 我的解决方案涉及保持概要文件对象更长的时间,并且只在“末尾”将其写出。 Hooking into the Pool teardown is described better elsewhere , but involves using a
Finalize
object to execute dump_stats()
explicitly at the appropriate time. 挂钩Pool拆卸在其他地方有更好的描述,但涉及使用
Finalize
对象在适当的时间显式执行dump_stats()
。
This also allows me to tidy up the awkward work_
trampoline needed with the runctx
I was using before. 这也使我可以整理以前使用过的
runctx
所需的笨拙的work_
蹦床。
import multiprocessing as mp
import cProfile
import time
import random
def work(i):
# enable profiling (refers to the global object below)
prof.enable()
x = random.random()
time.sleep(x)
# disable so we don't profile the Pool
prof.disable()
return (i,x)
# Initialise a good profile object and make sure it gets written during Pool teardown
def _poolinit():
global prof
prof = cProfile.Profile()
def fin():
prof.dump_stats('profile-%s.out' % mp.current_process().pid)
mp.util.Finalize(None, fin, exitpriority=1)
# create our pool
pool = mp.Pool(10, _poolinit)
for i in pool.imap_unordered(work, range(100)):
print(i)
Loading the output shows that multiple invocations were indeed recorded: 加载输出表明确实记录了多个调用:
> p = pstats.Stats("profile-ForkPoolWorker-5.out")
> p.sort_stats("time").print_stats(10)
Fri Sep 11 12:11:58 2015 profile-ForkPoolWorker-5.out
30 function calls in 4.684 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
10 4.684 0.468 4.684 0.468 {built-in method sleep}
10 0.000 0.000 0.000 0.000 {method 'random' of '_random.Random' objects}
10 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.