[英]Python Line_profiler and Cython function
So I'm trying to profile a function within a python script of my own using line_profiler
, because I want line-by-line timings. 所以我试图使用
line_profiler
在我自己的python脚本中分析一个函数,因为我想要逐行时序。 The only problem is that the function is a Cython one, and line_profiler
isn't working correctly. 唯一的问题是该函数是一个Cython,并且
line_profiler
无法正常工作。 On the first runs it was just crashing with an error. 在第一次运行时,它只是崩溃了一个错误。 I then added
然后我补充道
!python
cython: profile=True
cython: linetrace=True
cython: binding=True
at the top of my script and now it runs fine, except the timings and statistics are blank! 在我的脚本的顶部,现在运行正常,除了时间和统计是空白的!
Is there a way to use line_profiler
with a Cythonized function? 有没有办法使用的方式
line_profiler
具有Cythonized功能?
I could profile the non-Cythonized function, but it's so much slower than the Cythonized one that I could not use the information coming from the profiling - the slowness of the pure python one would make it impossible how I could improve the Cython one. 我可以分析非Cythonized函数,但它比Cythonized函数慢得多,我无法使用来自分析的信息 - 纯python的慢速将使我无法改进Cython的一个。
Here is the code of the function I'd want to profile: 这是我想要分析的函数的代码:
class motif_hit(object):
__slots__ = ['position', 'strand']
def __init__(self, int position=0, int strand=0):
self.position = position
self.strand = strand
#the decorator for line_profiler
@profile
def find_motifs_cython(list bed_list, list matrices=None, int limit=0, int mut=0):
cdef int q = 3
cdef list bg = [0.25, 0.25, 0.25, 0.25]
cdef int matrices_length = len(matrices)
cdef int results_length = 0
cdef int results_length_shuffled = 0
cdef np.ndarray upper_adjust_list = np.zeros(matrices_length, np.int)
cdef np.ndarray lower_adjust_list = np.zeros(matrices_length, np.int)
#this one need to be a list for MOODS
cdef list threshold_list = [None for _ in xrange(matrices_length)]
cdef list matrix_list = [None for _ in xrange(matrices_length)]
cdef np.ndarray results_list = np.zeros(matrices_length, np.object)
cdef int count_seq = len(bed_list)
cdef int mat
cdef int i, j, k
cdef int position, strand
cdef list result, results, results_shuffled
cdef dict result_temp
cdef int length
if count_seq > 0:
for mat in xrange(matrices_length):
matrix_list[mat] = matrices[mat]['matrix'].tolist()
#change that for a class
results_list[mat] = {'kmer': matrices[mat]['kmer'],
'motif_count': 0,
'pos_seq_count': 0,
'motif_count_shuffled': 0,
'pos_seq_count_shuffled': 0,
'ratio': 0,
'sequence_positions': np.empty(count_seq, np.object)}
length = len(matrices[mat]['kmer'])
#wrong with imbalanced matrices
upper_adjust_list[mat] = int(ceil(length / 2.0))
lower_adjust_list[mat] = int(floor(length / 2.0))
#upper_adjust_list[mat] = 0
#lower_adjust_list[mat] = 0
#-0.1 to adjust for a division floating point bug (4.99999 !< 5, but is < 4.9!)
threshold_list[mat] = MOODS.max_score(matrix_list[mat]) - float(mut) - 0.1
#for each sequence
for i in xrange(count_seq):
item = bed_list[i]
#TODO: remove the Ns, but it might unbalance
results = MOODS.search(str(item.sequence[limit:item.total_length - limit]), matrix_list, threshold_list, q=q, bg=bg, absolute_threshold=True, both_strands=True)
results_shuffled = MOODS.search(str(item.sequence_shuffled[limit:item.total_length - limit]), matrix_list, threshold_list, q=q, bg=bg, absolute_threshold=True, both_strands=True)
results = results[0:len(matrix_list)]
results_shuffled = results_shuffled[0:len(matrix_list)]
results_length = len(results)
#for each matrix
for j in xrange(results_length):
result = results[j]
result_shuffled = results_shuffled[j]
upper_adjust = upper_adjust_list[j]
lower_adjust = lower_adjust_list[j]
result_length = len(result)
result_length_shuffled = len(result_shuffled)
if result_length > 0:
results_list[j]['pos_seq_count'] += 1
results_list[j]['sequence_positions'][i] = np.empty(result_length, np.object)
#for each motif
for k in xrange(result_length):
position = result[k][0]
strand = result[k][1]
if position >= 0:
strand = 0
adjust = upper_adjust
else:
position = -position
strand = 1
adjust = lower_adjust
results_list[j]['motif_count'] += 1
results_list[j]['sequence_positions'][i][k] = motif_hit(position + adjust + limit, strand)
if result_length_shuffled > 0:
results_list[j]['pos_seq_count_shuffled'] += 1
#for each motif
for k in xrange(result_length_shuffled):
results_list[j]['motif_count_shuffled'] += 1
#j = j + 1
#i = i + 1
for i in xrange(results_length):
result_temp = results_list[i]
result_temp['ratio'] = float(result_temp['pos_seq_count']) / float(count_seq)
return results_list
I'm pretty sure the triple nested loop is the main slow part - it's job is just to rearrange the results coming from MOODS, the C module doing the main work. 我很确定三重嵌套循环是主要的缓慢部分 - 它的工作只是重新排列来自MOODS的结果,C模块正在完成主要工作。
Till Hoffmann has useful information on using line_profiler with Cython here: How to profile cython functions line-by-line . Till Hoffmann在这里有关于使用line_profiler和Cython的有用信息: 如何逐行剖析cython函数 。
I quote his solution: 我引用他的解决方案:
Robert Bradshaw helped me to get Robert Kern's line_profiler
tool working for cdef
functions and I thought I'd share the results on stackoverflow
. Robert Bradshaw帮助我让Robert Kern的
line_profiler
工具为cdef
函数工作,我想我会在stackoverflow
上分享结果。
In short, set up a regular .pyx
file and build script and pass to cythonize
the linetrace
compiler directive to enable both profiling and line tracing: 总之,建立经常性的
.pyx
文件和构建脚本,并传递给cythonize
的linetrace
编译器指令 ,以使两者分析和线条跟踪:
from Cython.Build import cythonize
cythonize('hello.pyx', compiler_directives={'linetrace': True})
You may also want to set the ( undocumented ) directive binding
to True
. 您可能还希望将( 未记录的 ) 指令
binding
为True
。
Also, you should define the C macro CYTHON_TRACE=1
by modifying your extensions
setup such that 此外,您应该通过修改
extensions
设置来定义C宏CYTHON_TRACE=1
extensions = [
Extension('test', ['test.pyx'], define_macros=[('CYTHON_TRACE', '1')])
]
A working example using the %%cython
magic in the iPython
notebook is here: http://nbviewer.ipython.org/gist/tillahoffmann/296501acea231cbdf5e7 在
iPython
笔记本中使用%%cython
magic的工作示例如下: http : iPython
Api was changed. Api改变了。 Now:
现在:
from Cython.Compiler.Options import get_directive_defaults
directive_defaults = get_directive_defaults()
directive_defaults['linetrace'] = True
directive_defaults['binding'] = True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.