简体   繁体   English

使用Cython优化numpy.dot

[英]Optimizing numpy.dot with Cython

I have the following piece of code which I'd like to optimize using Cython: 我有以下代码,我想使用Cython进行优化:

sim = numpy.dot(v1, v2) / (sqrt(numpy.dot(v1, v1)) * sqrt(numpy.dot(v2, v2))) 
dist = 1-sim
return dist

I have written and compiled the .pyx file and when I ran the code I do not see any significant improvement in performance. 我编写并编译了.pyx文件,当我运行代码时,我没有看到任何性能上的显着改进。 According to the Cython documentation I have to add c_types. 根据Cython文档,我必须添加c_types。 The HTML file generated by Cython indicates that the bottleneck is the dot products (which is expected of course). Cython生成的HTML文件表明瓶颈是点积(当然是预期的)。 Does this mean that I have to define a C function for the dot products? 这是否意味着我必须为点积定义C函数? If yes how do I do that? 如果是,我该怎么做?

EDIT: 编辑:

After some research I have come up with the following code. 经过一些研究后,我提出了以下代码。 The improvement is only marginal. 改善只是微不足道的。 I am not sure if there is something I can do to improve it : 我不确定我能做些什么来改善它:

from __future__ import division
import numpy as np
import math as m
cimport numpy as np
cimport cython

cdef extern from "math.h":
    double c_sqrt "sqrt"(double)

ctypedef np.float reals #typedef_for easier readding

cdef inline double dot(np.ndarray[reals,ndim = 1] v1, np.ndarray[reals,ndim = 1] v2):
  cdef double result = 0
  cdef int i = 0
  cdef int length = v1.size
  cdef double el1 = 0
  cdef double el2 = 0
  for i in range(length):
    el1 = v1[i]
    el2 = v2[i]
    result += el1*el2
  return result

@cython.cdivision(True)
def distance(np.ndarray[reals,ndim = 1] ex1, np.ndarray[reals,ndim = 1] ex2):
  cdef double dot12 = dot(ex1, ex2)
  cdef double dot11 = dot(ex1, ex1)
  cdef double dot22 = dot(ex2, ex2)
  cdef double sim = dot12 / (c_sqrt(dot11 * dot22))
  cdef double dist = 1-sim    
  return dist 

As a general note, if you are calling numpy functions from within cython and doing little else, you generally will see only marginal gains if any at all. 一般来说,如果你从cython中调用numpy函数而没有做其他事情,你通常只会看到边际收益(如果有的话)。 You generally only get massive speed-ups if you are statically typing code that makes use of an explicit for loop at the python level (not in something that is calling the Numpy C-API already). 如果你静态地输入在python级别使用显式for循环的代码(而不是已经调用Numpy C-API的东西),你通常只会获得大量的加速。

You could try writing out the code for a dot product with all of the static typing of the counter, input numpy arrays, etc, with wraparound and boundscheck set to False, import the clib version of the sqrt function and then try to leverage the parallel for loop ( prange ) to make use of openmp. 您可以尝试使用计数器的所有静态类型,输入numpy数组等来编写点积的代码,并将wraparound和boundscheck设置为False,导入sqrt函数的clib版本然后尝试利用并行for loop( prange )来利用prange

You can change the expression 您可以更改表达式

sim = numpy.dot(v1, v2) / (sqrt(numpy.dot(v1, v1)) * sqrt(numpy.dot(v2, v2))) 

to

sim = numpy.dot(v1, v2) / sqrt(numpy.dot(v1, v1) * numpy.dot(v2, v2))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM