[英]How to pass numpy logic functions to Cython correctly?
What declarations should I be incorporating with a logic function / index operation so that Cython does the heavy lifting? 我应该将哪些声明与逻辑函数/索引操作合并在一起,以便Cython能够轻松完成任务?
I have two large rasters in the form of numpy arrays of equal size. 我有两个大小相等的numpy数组形式的大栅格。 The first array contains vegetation index values and the second array contains field IDs.
第一个数组包含植被索引值,第二个数组包含字段ID。 The goal is to average vegetation index values by field.
目标是按田地平均植被指数值。 Both arrays have pesky nodata values (-9999) that I would like to ignore.
这两个数组都有讨厌的nodata值(-9999),我想忽略它们。
Currently the function takes over 60 seconds to execute, which normally I wouldn't mind so much but I'll be processing potentially hundreds of images. 目前,该函数需要60秒钟才能执行,通常我不会介意那么多,但我将处理数百个图像。 Even a 30 second improvement would be significant.
甚至30秒的改善也是很重要的。 So I've been exploring Cython as a way to help speed things up.
因此,我一直在探索Cython,以帮助加快运行速度。 I've been using the Cython numpy tutorial as a guide.
我一直在使用Cython numpy教程作为指南。
test_cy.pyx code: test_cy.pyx代码:
import numpy as np
cimport numpy as np
cimport cython
@cython.boundscheck(False) # turn off bounds-checking for entire function
@cython.wraparound(False) # turn off negative index wrapping for entire function
cpdef test():
cdef np.ndarray[np.int16_t, ndim=2] ndvi_array = np.load("Z:cython_test/data/ndvi.npy")
cdef np.ndarray[np.int16_t, ndim=2] field_array = np.load("Z:cython_test/data/field_array.npy")
cdef np.ndarray[np.int16_t, ndim=1] unique_field = np.unique(field_array)
unique_field = unique_field[unique_field != -9999]
cdef int field_id
cdef np.ndarray[np.int16_t, ndim=1] f_ndvi_values
cdef double f_avg
for field_id in unique_field :
f_ndvi_values = ndvi_array[np.logical_and(field_array == field_id, ndvi_array != -9999)]
f_avg = np.mean(f_ndvi_values)
Setup.py code: Setup.py代码:
try:
from setuptools import setup
from setuptools import Extension
except ImportError:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize
import numpy
setup(ext_modules = cythonize('test_cy.pyx'),
include_dirs=[numpy.get_include()])
After some researching and running: 经过研究和运行:
cython -a test_cy.pyx
It seems the index operation ndvi_array[np.logical_and(field_array == field_id, ndvi_array != -9999)]
is the bottleneck and is still relying on Python. 看来索引操作
ndvi_array[np.logical_and(field_array == field_id, ndvi_array != -9999)]
是瓶颈,仍然依赖于Python。 I suspect I'm missing some vital declarations here. 我怀疑我在这里缺少一些重要的声明。 Including
ndim
didn't have any effect. 包括
ndim
并没有任何效果。
I'm fairly new to numpy as well so I'm probably missing something obvious. 我对numpy也相当陌生,所以我可能缺少明显的东西。
Your problem looks fairly vectorizable to me, so Cython might not be the best approach. 您的问题对我来说似乎可以解决,因此Cython可能不是最好的方法。 (Cython shines when there are unavoidable fine grained loops.) As your dtype is
int16
there is only a limited range of possible labels, so using np.bincount
should be fairly efficient. (当出现不可避免的细粒度循环时,Cython会发光。)由于
int16
为int16
,因此可能的标签范围非常有限,因此使用np.bincount
应该相当有效。 Try something like (this is assuming all your valid values are >= 0 if that is not the case you'd have to shift - or (cheaper) view-cast to uint16
(since we are not doing any arithmetic on the labels that should be safe) - before using bincount
): 尝试类似的操作(这是假设所有有效值都> = 0,如果不是这种情况,则不必将其移位-或(廉价)视图转换为
uint16
(因为我们没有对应该安全)-在使用bincount
之前):
mask = (ndvi_array != -9999) & (field_array != -9999)
nd = ndvi_array[mask]
fi = field_array[mask]
counts = np.bincount(fi, minlength=2**15)
sums = np.bincount(fi, nd, minlength=2**15)
valid = counts != 0
avgs = sums[valid] / counts[valid]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.