在Python中增加cProfiler的深度以报告更多功能？

Question

我正在尝试分析调用其他函数的函数。 我将分析器称为如下：

from mymodule import foo
def start():
   # ...
   foo()

import cProfile as profile
profile.run('start()', output_file)
p = pstats.Stats(output_file)
print "name: "
print p.sort_stats('name')
print "all stats: "
p.print_stats()
print "cumulative (top 10): "
p.sort_stats('cumulative').print_stats(10)

我发现分析器说所有的时间都花在了mymodule的函数“foo（）”上，而不是把它放到子函数foo（）调用中，这就是我想要看到的。 如何让分析器报告这些功能的性能？

谢谢。

Answer 1

您需要p.print_callees()来获取方法调用的分层细分。 输出是非常自我解释的：在左栏中，您可以找到您感兴趣的函数，例如foo() ，然后转到右侧列显示所有被调用的子函数及其作用域的总计和累计时间。 这些子呼叫的故障也包括在内等。

Answer 2

首先，我想说我无法复制Asker的问题。 探查器（在py2.7中）肯定会进入被调用的函数和方法。 （py3.6的文档看起来完全相同，但我没有在py3上测试。）我的猜测是，通过将其限制为前10个返回，按累计时间排序，前N个是非常高级的函数调用最少的时间， foo()调用的函数从列表的底部删除。

我决定玩一些大数字进行测试。 这是我的测试代码：

# file: mymodule.py
import math

def foo(n = 5):
    for i in xrange(1,n):
        baz(i)
        bar(i ** i)

def bar(n):
    for i in xrange(1,n):
        e  = exp200(i)
        print "len e: ", len("{}".format(e))

def exp200(n):
    result = 1
    for i in xrange(200):
        result *= n
    return result

def baz(n):
    print "{}".format(n)

包含文件（与Asker非常相似）：

# file: test.py

from mymodule import foo

def start():
   # ...
   foo(8)

OUTPUT_FILE = 'test.profile_info'

import pstats
import cProfile as profile

profile.run('start()', OUTPUT_FILE)
p = pstats.Stats(OUTPUT_FILE)
print "name: "
print p.sort_stats('name')
print "all stats: "
p.print_stats()
print "cumulative (top 10): "
p.sort_stats('cumulative').print_stats(10)
print "time (top 10): "
p.sort_stats('time').print_stats(10)

注意最后一行。 我添加了一个按time排序的视图，这是在函数“ 不包括调用子函数的时间 ”中花费的总时间。 我发现这个视图更有用，因为它倾向于支持正在进行实际工作的函数，并且可能需要优化。

以下是Asker工作的部分结果（ cumulative排序）：

cumulative (top 10):
Thu Mar 24 21:26:32 2016    test.profile_info

         2620840 function calls in 76.039 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   76.039   76.039 <string>:1(<module>)
        1    0.000    0.000   76.039   76.039 test.py:5(start)
        1    0.000    0.000   76.039   76.039 /Users/jhazen/mymodule.py:4(foo)
        7   10.784    1.541   76.039   10.863 /Users/jhazen/mymodule.py:10(bar)
   873605   49.503    0.000   49.503    0.000 /Users/jhazen/mymodule.py:15(exp200)
   873612   15.634    0.000   15.634    0.000 {method 'format' of 'str' objects}
   873605    0.118    0.000    0.118    0.000 {len}
        7    0.000    0.000    0.000    0.000 /Users/jhazen/mymodule.py:21(baz)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

了解此显示中的前3个函数仅被调用一次。 让我们看看time分类视图：

time (top 10):
Thu Mar 24 21:26:32 2016    test.profile_info

         2620840 function calls in 76.039 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   873605   49.503    0.000   49.503    0.000 /Users/jhazen/mymodule.py:15(exp200)
   873612   15.634    0.000   15.634    0.000 {method 'format' of 'str' objects}
        7   10.784    1.541   76.039   10.863 /Users/jhazen/mymodule.py:10(bar)
   873605    0.118    0.000    0.118    0.000 {len}
        7    0.000    0.000    0.000    0.000 /Users/jhazen/mymodule.py:21(baz)
        1    0.000    0.000   76.039   76.039 /Users/jhazen/mymodule.py:4(foo)
        1    0.000    0.000   76.039   76.039 test.py:5(start)
        1    0.000    0.000   76.039   76.039 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

现在排名第一的条目是有道理的。 显然，通过重复乘法将东西提升到200倍的力量是一种“天真”的策略。 我们来取代它：

def exp200(n):
    return n ** 200

结果如下：

time (top 10):
Thu Mar 24 21:32:18 2016    test.profile_info

         2620840 function calls in 30.646 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   873612   15.722    0.000   15.722    0.000 {method 'format' of 'str' objects}
        7    9.760    1.394   30.646    4.378 /Users/jhazen/mymodule.py:10(bar)
   873605    5.056    0.000    5.056    0.000 /Users/jhazen/mymodule.py:15(exp200)
   873605    0.108    0.000    0.108    0.000 {len}
        7    0.000    0.000    0.000    0.000 /Users/jhazen/mymodule.py:18(baz)
        1    0.000    0.000   30.646   30.646 /Users/jhazen/mymodule.py:4(foo)
        1    0.000    0.000   30.646   30.646 test.py:5(start)
        1    0.000    0.000   30.646   30.646 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

这是一个很好的改进。 现在str.format()是我们最糟糕的罪犯。 我在bar()添加了一行来打印数字的长度，因为我的第一次尝试（只是计算数字并且不做任何事情）得到了优化，并且我试图避免这种情况（打印数字，这真的很大真的很快）似乎它可能阻塞了I / O，所以我在打印数字长度时妥协了。 嘿，这是基数10日志。 我们试试看：

def bar(n):
    for i in xrange(1,n):
        e  = exp200(i)
        print "log e: ", math.log10(e)

结果如下：

time (top 10):
Thu Mar 24 21:40:16 2016    test.profile_info

         1747235 function calls in 11.279 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        7    6.082    0.869   11.279    1.611 /Users/jhazen/mymodule.py:10(bar)
   873605    4.996    0.000    4.996    0.000 /Users/jhazen/mymodule.py:15(exp200)
   873605    0.201    0.000    0.201    0.000 {math.log10}
        7    0.000    0.000    0.000    0.000 /Users/jhazen/mymodule.py:18(baz)
        1    0.000    0.000   11.279   11.279 /Users/jhazen/mymodule.py:4(foo)
        7    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
        1    0.000    0.000   11.279   11.279 test.py:5(start)
        1    0.000    0.000   11.279   11.279 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

嗯，即使没有str.format() ，仍然需要花费大量时间在bar() str.format() 。 让我们摆脱那个印刷品：

def bar(n):
    z = 0
    for i in xrange(1,n):
        e  = exp200(i)
        z += math.log10(e)
    return z

结果如下：

time (top 10):
Thu Mar 24 21:45:24 2016    test.profile_info

         1747235 function calls in 5.031 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   873605    4.487    0.000    4.487    0.000 /Users/jhazen/mymodule.py:17(exp200)
        7    0.440    0.063    5.031    0.719 /Users/jhazen/mymodule.py:10(bar)
   873605    0.104    0.000    0.104    0.000 {math.log10}
        7    0.000    0.000    0.000    0.000 /Users/jhazen/mymodule.py:20(baz)
        1    0.000    0.000    5.031    5.031 /Users/jhazen/mymodule.py:4(foo)
        7    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
        1    0.000    0.000    5.031    5.031 test.py:5(start)
        1    0.000    0.000    5.031    5.031 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

现在看起来做实际工作的东西是最繁忙的功能，所以我认为我们已经完成了优化。

希望有所帮助！

Answer 3

也许你遇到了类似的问题，所以我将在这里描述我的问题。 我的分析代码看起来像这样：

def foobar():
    import cProfile
    pr = cProfile.Profile()
    pr.enable()
    for event in reader.events():
        baz()
        # and other things

    pr.disable()
    pr.dump_stats('result.prof')

最终的分析输出只包含events()调用。 我花了不少时间意识到我有空循环分析。 当然，客户端代码中有多个foobar()调用，但有意义的分析结果已被最后一个带有空循环的调用覆盖。

在Python中增加cProfiler的深度以报告更多功能？

问题描述

3 个解决方案

解决方案1
0 2016-03-24 13:19:37

解决方案2
0 2016-03-25 04:51:11

解决方案3
-1 2015-07-10 09:33:46

在Python中增加cProfiler的深度以报告更多功能？

问题描述

3 个解决方案

解决方案1 0 2016-03-24 13:19:37

解决方案2 0 2016-03-25 04:51:11

解决方案3 -1 2015-07-10 09:33:46

解决方案1
0 2016-03-24 13:19:37

解决方案2
0 2016-03-25 04:51:11

解决方案3
-1 2015-07-10 09:33:46