简体   繁体   中英

Cython convert numpy ndarray (N,4,2) to vector[vector[pair[double,double]]

I have spent litterally weeks changing my cython code to pure C (still in Cython though) to gain speed and to be able to remove the GIL to do multithreading to gain even more speed.

With the help of fellow stackoverflow users I finally succeeded and gained a factor 10 pure C vs cython with some python and then again a factor 3 by using 4 threads (with prange) in the double for loop part of my code.

BUT in order to enter this loop I first have to convert two 3 dimensional numpy ndarrays of dimensions (N,4,2) (and (K,4,2)) to vector[vector[pair[double,double]]] . K and N being reasonably large.

For this I am doing:

cdef int N=200000 #Of this order of magnitude
cdef np.ndarray[DTYPE_t,ndim=3] numpy_array=np.random.uniform(size=(N,4,2))
t1=time.time()
cdef vector[vector[pair[double,double]]] c_structure
c_structure.reserve(N)
cdef int i
for i in range(N):
  c_structure.push_back(numpy_array[i])
t2=time.time()

Yet this part of code which I deemed trivial has become the new bottleneck of my code !!! The double for loop takes on my computer 0.1s (instead of 1.11s in the original implementation) single thread and this part takes 3 whole seconds (1.5s for each array) ! Which makes my super optimized code 3 times slower than my original code (1.5*2+0.1) !

What am I doing wrong ?! How to speed this ?!

See another related question that I asked

You have an Nx4x2 array and you are converting it to vector[vector[pair[double,double]] . In C++, vectors of vectors are not efficient. Instead, you should create a 4x2 struct and make a single vector of those. Or, better yet, you should directly use the NumPy array in C++ as a pointer to Nx4x2 array. In other words, stop copying your data unnecessarily, but if it is necessary, copy to a fixed Nx4x2 structure instead of NxMx2 which is slow.

I won a factor of 100 in speed by explicitely initializing each element of the vector. Indeed with a cython -a it now has 0 yellow lines.

cdef int N=200000 #Of this order of magnitude
cdef np.ndarray[DTYPE_t,ndim=3] numpy_array=np.random.uniform(size=(N,4,2))
t1=time.time()
cdef vector[vector[pair[double,double]]] c_structure
cdef vector[pair[double,double]] empty_vector, vector
cdef pair[double,double] a1, a2, a3, a4
c_structure.reserve(N)
cdef int i
for i in range(N):
  a1.first=numpy_array[i,0,0]
  a1.second=numpy_array[i,0,1]
  a2.first=numpy_array[i,1,0]
  a2.second=numpy_array[i,1,1]
  a3.first=numpy_array[i,2,0]
  a3.second=numpy_array[i,2,1]
  a4.first=numpy_array[i,3,0]
  a4.second=numpy_array[i,3,1]
  vector.push_back(a1)
  vector.push_back(a2)
  vector.push_back(a3)
  vector.push_back(a4)
  c_structure.push_back(vector)
  vector=empty_vector
t2=time.time()

0.036s instead of 3s

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM