简体   繁体   English

将 ndarrays 列表 Cythonize 为indirect_contiguous

[英]Cythonize list of ndarrays to indirect_contiguous

I want to cythonize a list of ndarrays (with different sizes) to speed up the performance of a function.我想对 ndarrays 列表(具有不同大小)进行 cythonize 以加快函数的性能。 A data structure of the type [:: view.indirect_contiguous,::1] seems the way to go, creating a contiguous array of pointers linked to contiguous memoryview of different sizes, but it is not clear to me how to setup it properly. [:: view.indirect_contiguous,::1] 类型的数据结构似乎是要走的路,创建链接到不同大小的连续内存视图的连续指针数组,但我不清楚如何正确设置它。 How do I do it up and how do I access its elements?我该怎么做以及如何访问它的元素?

In the following MWE I put the simple sum of elements just to test the access of the elements (I am not interested in speeding it up with other formulations)在下面的 MWE 中,我将简单的元素总和用于测试元素的访问(我对使用其他公式加速它不感兴趣)

from typing import List
import numpy as np

def python_foo(array_list: List[np.ndarray]):
  list_len = len(array_list)
  results = np.zeros((list_len,1), dtype=np.int8)
  # prints few elements and sum them
  print(array_list[0][0], array_list[0][1], array_list[1][0], array_list[1][1])
  for k in range(list_len):
    results[k] = np.sum(array_list[k])
  return results


import cython
cimport numpy as np
from cython cimport view
from libc.stdio cimport printf

DTYPE = np.float64
ctypedef np.float64_t DTYPE_t

def cython_foo(array_list: List[np.ndarray]):
  cdef int list_len = len(array_list)
  cdef DTYPE_t[::view.indirect_contiguous, ::1] my_mem_view
  # (1) - how do I assign the ndarrays to my_mem_view? 

  # prints few elements and sum them
  # (2) - how do I access the elements of my_mem_view? is this correct?
  printf("%f %f %f %f\n", my_mem_view[0,0], my_mem_view[0,1], my_mem_view[1,0], my_mem_view[1,1])
  cdef DTYPE_t results[list_len] = {0}
  cdef int k
  cdef int n
  for k in range(list_len):
    for n in range(array_list[k].size) # should I also create an array of lengths?
      results[k] += my_mem_view[k,n] 
  # BONUS question: I probably need to convert results to Python objects (list, ndarrays), right?
  return results

Here's a possible solution that uses a temporary memoryview to get the pointer to the data.这是一个可能的解决方案,它使用临时 memoryview 来获取指向数据的指针。 If anyone finds a better, cleaner or quicker answer please let me know.如果有人找到更好、更清晰或更快的答案,请告诉我。

I wonder if I got the memory management right or if something is missing.我想知道我的内存管理是否正确,或者是否缺少某些东西。

# indirect_contiguous.pyx
cimport numpy as np
from cpython.mem cimport PyMem_Malloc, PyMem_Free
DTYPE = np.float64
ctypedef np.float64_t DTYPE_t

def cython_foo(array_list: List[np.ndarray]):
  cdef int list_len = len(array_list)

  # (1) 
  cdef DTYPE_t ** my_mem_view = <DTYPE_t **> PyMem_Malloc(list_len * sizeof(DTYPE_t *))
  cdef int idx
  cdef DTYPE_t[:] item_1 # cdef not allowed in conditionals
  cdef DTYPE_t[:,:] item_2
  cdef DTYPE_t[:,:,:] item_3
  # ... other cdef for DTYPE_t[...] up to dimension 8 - the maximum allowed for a memoryview
  cdef int *array_len = <int *> PyMem_Malloc(list_len * sizeof(int))
  for idx in range(list_len):
    if len(array_list[idx].shape) == 1:
      item_1 = np.ascontiguousarray(array_list[idx], dtype=DTYPE)
      my_mem_view[idx] = <DTYPE_t *> &item_1[0]
    elif len(array_list[idx].shape) == 2:
      item_2 = np.ascontiguousarray(array_list[idx], dtype=DTYPE)
      my_mem_view[idx] = <DTYPE_t *> &item_2[0, 0]
    elif len(array_list[idx].shape) == 3:
      item_3 = np.ascontiguousarray(array_list[idx], dtype=DTYPE)
      my_mem_view[idx] = <DTYPE_t *> &item_3[0, 0, 0]
    # ... other elif for DTYPE_t[:,:,:] up to dimension 8
    array_len[idx] = array_list[idx].size

  # prints few elements and sum them
  # (2)
  printf("%f %f %f %f\n", my_mem_view[0][0], my_mem_view[0][1], my_mem_view[1][0], my_mem_view[1][1])
  cdef DTYPE_t *results = <DTYPE_t *> PyMem_Malloc(list_len * sizeof(DTYPE_t))
  cdef int k
  cdef int n
  for k in range(list_len):
    results[k] = 0
    for n in range(array_len[k]):
      results[k] += my_mem_view[k][n]

  py_results = []
  for k in range(list_len):
    py_results.append(results[k])

  # free memory
  PyMem_Free(my_mem_view)
  PyMem_Free(array_len)
  PyMem_Free(results)

  return py_results

Testing the speed with测试速度

import timeit
print(timeit.timeit(stmt="indirect_contiguous.python_foo([np.random.random((100,100,100)), np.random.random((100,100))])",
                    setup="import numpy as np; import indirect_contiguous; ", number=100))
print(timeit.timeit(stmt="indirect_contiguous.cython_foo([np.random.random((100,100,100)), np.random.random((100,100))])",
                    setup="import numpy as np; import indirect_contiguous; ", number=100))

I get a small improvement (1.45 sec. vs 1.42 sec.) of the 2-3%, possibly due to the fact that I am just doing sum of elements (for which numpy is already optimized).我得到了 2-3% 的小改进(1.45 秒 vs 1.42 秒),这可能是因为我只是在做元素的总和(numpy 已经对其进行了优化)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM