简体   繁体   English

分析 python C 扩展

[英]Profiling python C extensions

I have developed a python C-extension that receives data from python and compute some cpu intensive calculations.我开发了一个 python C 扩展,它从 python 接收数据并计算一些 cpu 密集型计算。 It's possible to profile the C-extension?可以分析 C 扩展吗?

The problem here is that writing a sample test in C to be profiled would be challenging because the code rely on particular inputs and data structures (generated by python control code).这里的问题是用 C 编写要分析的示例测试将具有挑战性,因为代码依赖于特定的输入和数据结构(由 python 控制代码生成)。

Do you have any suggestions?你有什么建议吗?

在 pygabriel 发表评论后,我决定将一个包上传到 pypi,该包使用来自 google-perftools 的 cpu-profiler 实现了 python 扩展的探查器: http ://pypi.python.org/pypi/yep

I've found my way using google-perftools .我找到了使用google-perftools 的方法 The trick was to wrap the functions StartProfiler and StopProfiler in python (throught cython in my case).诀窍是将函数 StartProfiler 和 StopProfiler 包装在 python 中(在我的例子中是通过 cython )。

To profile the C extension is sufficient to wrap the python code inside the StartProfiler and StopProfiler calls.分析 C 扩展足以将 Python 代码包装在 StartProfiler 和 StopProfiler 调用中。

from google_perftools_wrapped import StartProfiler, StopProfiler
import c_extension # extension to profile c_extension.so

StartProfiler("output.prof")
... calling the interesting functions from the C extension module ...
StopProfiler()

Then to analyze for example you can export in callgrind format and see the result in kcachegrind:然后以分析为例,您可以以 callgrind 格式导出并在 kcachegrind 中查看结果:

pprof --callgrind c_extension.so output.prof > output.callgrind 
kcachegrind output.callgrind

One of my colleague told me ltrace(1) .我的一位同事告诉我ltrace(1) It helped me on the same situation quite a lot.在同样的情况下,它对我帮助很大。

Assume the shared object name of your C extention is myext.so and you want to execute benchmark.py , then假设你的 C 扩展的共享对象名称是myext.so并且你想要执行benchmark.py ,然后

ltrace -x @myext.so -c python benchmark.py

Its output is like它的输出就像

% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 24.88   30.202126     7550531         4 ldap_result
 12.46   15.117625     7558812         2 l_ldap_result4
 12.41   15.059652     5019884         3 ldap_chase_v3referrals
 12.41   15.057678     3764419         4 ldap_new_connection
 12.40   15.050310     3762577         4 ldap_int_open_connection
 12.39   15.042360     3008472         5 ldap_send_server_request
 12.38   15.029055     3757263         4 ldap_connect_to_host
  0.05    0.057890       28945         2 ldap_get_option
  0.04    0.052182       26091         2 ldap_sasl_bind
  0.03    0.030760       30760         1 l_ldap_get_option
  0.03    0.030635       30635         1 LDAP_get_option
  0.02    0.029960       14980         2 ldap_initialize
  0.02    0.027988       27988         1 ldap_int_initialize
  0.02    0.026722       26722         1 l_ldap_simple_bind
  0.02    0.026386       13193         2 ldap_send_initial_request
  0.02    0.025810       12905         2 ldap_int_select
....

Special care is needed if your shared object has - or + in its file name.如果您的共享对象的文件名中有-+ ,则需要特别小心。 These characters aren't treated as is (see man 1 ltrace for details).这些字符不会按原样处理(有关详细信息,请参阅man 1 ltrace )。

The wildcard * can be a workaround such as -x @myext* in place of -x @myext-2.so .通配符*可以是一种解决方法,例如-x @myext*代替-x @myext-2.so

With gprof , you can profile any program that was properly compiled and linked ( gcc -pg etc, in gprof 's case).使用gprof ,您可以分析任何正确编译和链接的程序( gcc -pg等,在gprof的情况下)。 If you're using a Python version not built with gcc (eg, the Windows precompiled version the PSF distributes), you'll need to research what equivalent tools exist for that platform and toolchain (in the Windows PSF case, maybe mingw can help).如果您使用的 Python 版本不是用gcc构建的(例如,PSF 分发的 Windows 预编译版本),您将需要研究该平台和工具链存在哪些等效工具(在 Windows PSF 情况下,也许mingw可以提供帮助)。 There may be "irrelevant" data there (internal C functions in the Python runtime), and, if so, the percentages shown by gprof may not be applicable -- but the absolute numbers (of calls, and durations thereof) are still valid, and you can post-process gprof 's output (eg, with a little Python script;-) to exclude the irrelevant data and compute the percentages you want.那里可能有“不相关”的数据(Python 运行时中的内部 C 函数),如果是这样, gprof显示的百分比可能不适用——但绝对数字(调用次数及其持续时间)仍然有效,并且您可以对gprof的输出进行后处理(例如,使用一个小的 Python 脚本;-)以排除不相关的数据并计算您想要的百分比。

I found py-spy very easy to use.我发现py-spy非常易于使用。 See this blog post for an explanation of its native extension support.有关其原生扩展支持的说明,请参阅此博客文章

Highlights:强调:

  • pip installable pip 可安装
  • cpu sampling based基于 CPU 采样
  • no compiler flags required不需要编译器标志
  • executes your program or attaches to a running process执行您的程序或附加到正在运行的进程
  • multiple output formats (I recommend --format speedscope )多种输出格式(我推荐--format speedscope
  • configurable sampling rate可配置的采样率

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM