简体   繁体   English

Cython的问题无法改善性能

[英]Cython's prange not improving performance

I'm trying to improve the performance of some metric computations with Cython's prange . 我正在尝试使用Cython的prange来提高某些度量计算的prange Here are my codes: 这是我的代码:

def shausdorff(float64_t[:,::1] XA not None, float64_t[:,:,::1] XB not None):
    cdef:
        Py_ssize_t i
        Py_ssize_t n  = XB.shape[2]
        float64_t[::1] hdist = np.zeros(n)

    #arrangement to fix contiguity
    XB = np.asanyarray([np.ascontiguousarray(XB[:,:,i]) for i in range(n)])

    for i in range(n):
        hdist[i] = _hausdorff(XA, XB[i])
    return hdist

def phausdorff(float64_t[:,::1] XA not None, float64_t[:,:,::1] XB not None):
    cdef:
        Py_ssize_t i
        Py_ssize_t n  = XB.shape[2]
        float64_t[::1] hdist = np.zeros(n)

    #arrangement to fix contiguity (EDITED)
    cdef float64_t[:,:,::1] XC = np.asanyarray([np.ascontiguousarray(XB[:,:,i]) for i in range(n)])

    with nogil, parallel(num_threads=4):
        for i in prange(n, schedule='static', chunksize=1):
            hdist[i] = _hausdorff(XA, XC[i])
    return hdist

Basically, in each iteration the hausdorff metric is computed between XA and each XB[i] . 基本上,在每次迭代中,在XA和每个XB[i]之间计算hausdorff度量。 Here is the signature of the _hausdorff function: 这是_hausdorff函数的签名:

cdef inline float64_t _hausdorff(float64_t[:,::1] XA, float64_t[:,::1] XB) nogil:
    ...

my problem is that both the sequential shausdorff and the parallel phausdorff have the same timings. 我的问题是顺序shausdorff和并行phausdorff具有相同的时序。 Furthermore, it seems that phausdorff is not creating any thread at all. 此外,似乎phausdorff根本不创建任何线程。

So my question is what is wrong with my code, and how can I fix it to get threading working. 所以我的问题是我的代码有什么问题,以及如何解决它才能使线程正常工作。

Here is my setup.py : 这是我的setup.py

from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize
from Cython.Distutils import build_ext

ext_modules=[
    Extension("custom_metric",
              ["custom_metric.pyx"],
              libraries=["m"],
              extra_compile_args = ["-O3", "-ffast-math", "-march=native", "-fopenmp" ],
              extra_link_args=['-fopenmp']
              ) 
]

setup( 
  name = "custom_metric",
  cmdclass = {"build_ext": build_ext},
  ext_modules = ext_modules
) 

EDIT 1: Here is a link to the html generated by cython -a : custom_metric.html 编辑1:这是cython -a生成的html的链接: custom_metric.html

EDIT 2: Here is an example on how to call the corresponding functions (you need to compile the Cython file first) 编辑2:这是一个有关如何调用相应函数的示例(您需要先编译Cython文件

import custom_metric as cm
import numpy as np

XA = np.random.random((9000, 210))
XB = np.random.random((1000, 210, 9))

#timing 'parallel' version
%timeit cm.phausdorff(XA, XB)

#timing sequential version
%timeit cm.shausdorff(XA, XB)

I think this the parallelization is working, but the extra overhead of the parallelization is eating up the time it would have saved. 我认为这种并行化是有效的,但是并行化的额外开销正在消耗本该节省的时间。 If I try with different sized arrays then I do begin to see a speed up in the parallel version 如果我尝试使用其他大小的数组,那么我确实开始看到并行版本的速度有所提高

XA = np.random.random((900, 2100))
XB = np.random.random((100, 2100, 90))

Here the parallel version takes ~2/3 of the time of the serial version for me, which certainly isn't the 1/4 you'd expect, but does at least show some benefit. 在这里,并行版本花费的时间约为串行版本时间的2/3,这当然不是您期望的1/4,但至少确实显示出一些好处。


One improvement I can offer is to replace the code that fixes contiguity: 我可以提供的一项改进是替换修复连续性的代码:

XB = np.asanyarray([np.ascontiguousarray(XB[:,:,i]) for i in range(n)]) 

with

XB = np.ascontiguousarray(np.transpose(XB,[2,0,1]))

This speeds up both the parallel and non-parallel functions fairly significantly (a factor of 2 with the arrays you originally gave). 这可以显着提高并行和非并行函数的速度(对于您最初提供的数组,其速度是原来的2倍)。 It does make it slightly more obvious that you're being slowed down by overhead in the prange - the serial version is actually faster for the arrays in your example. 这确实使您更加明显地意识到, prange的开销降低了您的速度-串行版本对于示例中的数组而言实际上更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM