[英]Type cast error '__Pyx_memviewslice' to 'double *' Cython, what's the equivalent? MKL function prange code
I wrote a Cython program calling Intel MKL for matrix multiplication, with the purpose of making it parallel. 我编写了一个Cython程序,调用Intel MKL进行矩阵乘法,目的是使其并行。 It was based on an old SO post linking to BLAS and used a bunch of Cython methods I've never seen, but got it working and it was much slower than NumPy (also linked to MKL).
它基于链接到BLAS的旧SO帖子,并使用了许多我从未见过的Cython方法,但是使它起作用了,并且比NumPy(也链接到MKL)要慢得多。 In order to speed it up, I used the typical Memoryview format (it was using
ndarray
np.float64_t
datatype for a couple operations). 为了加快速度,我使用了典型的Memoryview格式(它使用
ndarray
np.float64_t
数据类型进行了几次操作)。 But now it no longer works using double[::1]
Memoryviews. 但是现在使用
double[::1]
Memoryviews不再有效。 Here's the error generated: 'type cast': cannot convert from '__Pyx_memviewslice' to 'double *'
这是生成的错误:
'type cast': cannot convert from '__Pyx_memviewslice' to 'double *'
And as a result of the type cast not working, the MKL function only sees 3 of 5 arguments: error C2660: 'cblas_ddot': function does not take 3 arguments
由于类型转换不起作用,因此MKL函数仅看到5个参数中的3个:
error C2660: 'cblas_ddot': function does not take 3 arguments
Here is the .PYX code: 这是.PYX代码:
import numpy as np
cimport numpy as np
cimport cython
from cython cimport view
from cython.parallel cimport prange #this is your OpenMP portion
from openmp cimport omp_get_max_threads #only used for getting the max # of threads on the machine
cdef extern from "mkl_cblas.h" nogil: #import a function from Intel's MKL library
double ddot "cblas_ddot"(int N,
double *X,
int incX,
double *Y,
int incY)
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
cpdef matmult(double[:,::1] A, double[:,::1] B):
cdef int Ashape0=A.shape[0], Ashape1=A.shape[1], Bshape0=B.shape[0], Bshape1=B.shape[1], Arowshape0=A[0,:].shape[0] #these are defined here as they aren't allowed in a prange loop
if Ashape1 != Bshape1:
raise TypeError('Inner dimensions are not consistent!')
cdef int i, j
cdef double[:,::1] out = np.zeros((Ashape0, Bshape1))
cdef double[::1] A_row = np.zeros(Ashape0)
cdef double[:] B_col = np.zeros(Bshape1) #no idea why this is not allowed to be [::1]
cdef int Arowstrides = A_row.strides[0] // sizeof(double)
cdef int Bcolstrides = B_col.strides[0] // sizeof(double)
cdef int maxthreads = omp_get_max_threads()
for i in prange(Ashape0, nogil=True, num_threads=maxthreads, schedule='static'): # to use all cores
A_row = A[i,:]
for j in range(Bshape1):
B_col = B[:,j]
out[i,j] = ddot(Arowshape0, #call the imported Intel MKL library
<double*>A_row,
Arowstrides,
<double*>B_col,
Bcolstrides)
return np.asarray(out)
I'm sure this is easy for someone on SO to point out. 我确信这对SO人士来说很容易指出。 And please advise if you see where improvement can be made - this was hacked and chopped together and I don't think the i / j loops are even needed.
并且请告知您是否看到可以改进的地方-这已被黑客砍断,我什至不需要i / j循环。 The cleanest example around: https://gist.github.com/JonathanRaiman/f2ce5331750da7b2d4e9 which I finally compiled is actually much faster (2x) but gives no results so I'll put that in another post (here: Calling BLAS / LAPACK directly using the SciPy interface and Cython - also how to add MKL )
我周围最干净的示例: https : //gist.github.com/JonathanRaiman/f2ce5331750da7b2d4e9我最终编译的实际上更快(2倍),但没有结果,所以我将其放在另一篇文章中(此处: 直接调用BLAS / LAPACK)使用SciPy界面和Cython-以及如何添加MKL )
Much appreciated. 非常感激。
To get a pointer from a memoryview you need to take the address of the first element 要从memoryview获取指针,您需要获取第一个元素的地址
ddot(Arowshape0, #call the imported Intel MKL library
&A_row[0],
Arowstrides,
&B_col[0],
Bcolstrides)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.