[英]Cython prange with an array of string
I'm trying to use prange in order to process multiple strings. 我正在尝试使用prange来处理多个字符串。 As it is not possible to do this with a python list, I'm using a numpy array.
由于无法使用python列表执行此操作,因此我使用的是numpy数组。
With an array of floats, this function works : 对于浮点数组,此函数有效:
from cython.parallel import prange
cimport numpy as np
from numpy cimport ndarray as ar
cpdef func_float(ar[np.float64_t,cast=True] x, double alpha):
cdef int i
for i in prange(x.shape[0], nogil=True):
x[i] = alpha * x[i]
return x
When I try this simple one : 当我尝试这个简单的方法时:
cpdef func_string(ar[np.str,cast=True] x):
cdef int i
for i in prange(x.shape[0], nogil=True):
x[i] = x[i] + str(i)
return x
I'm getting this 我得到这个
>> func_string(x = np.array(["apple","pear"],dtype=np.str))
File "processing.pyx", line 8, in processing.func_string
cpdef func_string(ar[np.str,cast=True] x):
ValueError: Item size of buffer (20 bytes) does not match size of 'str object' (8 bytes)
I'm probably missing something and I can't find an alternative to str. 我可能缺少了一些东西,但找不到str的替代方法。 Is there a way to properly use prange with an array of string ?
有没有一种方法可以正确地将prange与字符串数组一起使用?
Beside the fact, that your code should fail when cythonized, because you try to create a Python-object (ie str(i)
) without gil, your code isn't doing what you think it should do. 除了事实之外,您的代码在被cythonized处理后也会失败,因为您尝试创建不带gil的Python对象(即
str(i)
),因此您的代码没有按照您认为的那样做。
In order to analyse what is going on, let's take a look at a much simple cython-version: 为了分析正在发生的事情,让我们看一下一个非常简单的cython版本:
%%cython -2
cimport numpy as np
from numpy cimport ndarray as ar
cpdef func_string(ar[np.str, cast=True] x):
print(len(x))
From your error message, one can deduct that you use Python 3 and the Cython-extension is built with (still default) language_level=2
, thus I'm using -2
in the %%cython
-magic cell. 从错误消息中可以推断出您使用的是Python 3,而Cython-extension是使用(仍是默认值)
language_level=2
构建的,因此我在%%cython
-magic单元格中使用了-2
。
And now: 现在:
>>> x = np.array(["apple", "pear"], dtype=np.str)
>>> func_string(x)
ValueError: Item size of buffer (20 bytes) does not match size of 'str object' (8 bytes)
What is going on? 到底是怎么回事?
x
is not what you think it is x
不是您认为的那样
First, let's take a look at x
: 首先,让我们看一下
x
:
>>> x.dtype
<U5
So x
isn't a collection of unicode-objects. 所以
x
并不是unicode对象的集合。 An element of x
consist of 5 unicode-characters and those elements are stored contiguously in memory, one after another. x
一个元素由5个unicode字符组成,这些元素一个接一个地连续存储在内存中。 What is important: The same information as in unicode-objects stored in a different memory layout. 重要说明:与存储在不同内存布局中的Unicode对象中的信息相同 。
This is one of numpy's quirks and how np.array
works: every element in the list is converted to an unicode-object, than the maximal size of the element is calculated and dtype (in this case <U5
) is calculated and used. 这是numpy的怪癖之一,也是
np.array
工作方式:列表中的每个元素都将转换为unicode-object,然后将计算该元素的最大大小并计算并使用dtype(在这种情况下为<U5
)。
np.str
is interpreted differently in cython code ( ar[np.str] x
) (twice!) np.str
代码( ar[np.str] x
)对np.str
的解释不同(两次!)
First difference: in your Python3-code np.str
is for unicode
, but in your cython code, which is cythonized with language_level=2
, np.str
is for bytes
(see doc ). 第一个区别:在您的Python3代码中,
np.str
用于unicode
,但是在您的cython代码(使用language_level=2
np.str
)中, np.str
则用于bytes
(请参阅doc )。
Second difference: seeing np.str
, Cython will interpret it as array with Python-objects (maybe it should be seen as a Cython-bug) - it is almost the same as if dtype
were np.object
- actually the only difference to np.object
are slightly different error messages. 第二个区别:看
np.str
,用Cython将它解释为与Python对象(也许它应该被看作是一个用Cython-BUG)阵列-就好像几乎是相同的dtype
是np.object
-实际上的唯一区别np.object
是略有不同的错误消息。
With this information we can understand the error message. 有了这些信息,我们可以了解错误消息。 During the runtime, the input-array is checked (before the first line of the function is executed!):
在运行时,将检查输入数组(在执行函数的第一行之前!):
thus the cast cannot be done and the observed exception is thrown. 因此无法完成强制转换,并抛出观察到的异常。
you cannot change the size of an element in an <U..
-numpy-array : 您不能在
<U..
-numpy-array中更改元素的大小 :
Now let's take a look at the following: 现在,让我们看一下以下内容:
>>> x = np.array(["apple", b"pear"], dtype=np.str)
>>> x[0] = x[0]+str(0)
>>> x[0]
'apple'
the element didn't change, because the string x[0]+str(0)
was truncated while written back to x
-array: there is only place for 5 characters! 元素没有改变,因为字符串
x[0]+str(0)
在写回x
-array时被截断了:只有5个字符的位置! It would work (to some degree, as long as resulting string has no more than 5 characters) with "pear"
though: 但是,使用
"pear"
可以工作(在某种程度上,只要结果字符串不超过5个字符):
>>> x[1] = x[1]+str(1)
>>> x[1]
'pear0'
Where does this all leave you? 这一切在哪里离开你?
bytes
and not unicodes
(ie dtype=np.bytes_
) bytes
而不是unicodes
(即dtype=np.bytes_
) x
as ar x
in the signature and roll out the runtime checks, similar as done in the Cython's "depricated" numpy-tutorial . x
声明为ar x
并展开运行时检查,类似于在Cython的“专用” numpy中所做的那样-教程 。 All of the above, has nothing to do with prange
. 以上所有与
prange
。 To use prange
you cannot use str(i)
because it operates on python-objects. 要使用
prange
您不能使用str(i)
因为它可以在python对象上运行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.