[英]Using C++'s `str.erase()` from within Cython
I am implemeting a function in Cython that requires, at some point to remove some char
from a C++ std::string
. 我在Cython中实现了一个功能,该功能要求在某个时候从C ++
std::string
删除一些char
。 For this, I would use std::string::erase()
. 为此,我将使用
std::string::erase()
。 However, when I try to use it, Cython forces the object to be bytes()
instead of std::string()
, at which point it cannot find .erase()
. 但是,当我尝试使用它时,Cython强制对象为
bytes()
而不是std::string()
,此时它无法找到.erase()
。
To illustrate the issue, here is a minimal example (using IPython + Cython magic): 为了说明这个问题,这是一个最小的示例(使用IPython + Cython魔术):
%load_ext Cython
%%cython --cplus -c-O3 -c-march=native -a
from libcpp.string cimport string
cdef string my_func(string s):
cdef char c = b'\0'
cdef size_t s_size = s.length()
cdef size_t i = 0
while i + 1 <= s_size:
if s[i] == c:
s.erase(i, 1)
i += 1
return s
def cy_func(string b):
return my_func(b)
This compiles, but it indicates Python interaction on the .remove()
line, and when I try to use it, eg 这可以编译,但是它表明在
.remove()
行以及我尝试使用它时的Python交互,例如
b = b'ciao\0pippo\0'
print(b)
cy_func(b)
I get: 我得到:
AttributeError Traceback (most recent call last) AttributeError: 'bytes' object has no attribute 'erase'
AttributeError追溯(最近一次通话最近)AttributeError:“字节”对象没有属性“擦除”
Exception ignored in: '_cython_magic_5beaeb4004c3afc6d85b9b158c654cb6.my_func' AttributeError: 'bytes' object has no attribute 'erase'
在以下情况中忽略异常:'_cython_magic_5beaeb4004c3afc6d85b9b158c654cb6.my_func'AttributeError:'bytes'对象没有属性'erase'
How could I solve this? 我该如何解决?
s.erase(i, 1)
with say s[i] == 10
, I get my_func()
with no Python interaction (can even use the nogil
directive). s[i] == 10
替换s.erase(i, 1)
,我将得到没有Python交互作用的my_func()
(甚至可以使用nogil
指令)。 .replace(b'\\0', b'')
, but it is part of a longer algorithm I hope to optimize with Cython. .replace(b'\\0', b'')
在Python中做到这一点,但这是我希望使用Cython优化的更长算法的一部分。 You get access after the array bounds. 您可以在数组边界之后访问。 Fix it, and your code will be working.
修复它,您的代码将起作用。
The length of the string is decreased after erase
. erase
后,字符串的长度减小。 Also the condition i < s_size
looks better than i + 1 <= s_size
. 而且条件
i < s_size
看起来比i + 1 <= s_size
。 Finally, i
must not be incremented after erase
, the new char comes to that index. 最后,在
erase
之后, i
不能再递增,新的char进入该索引。
while i < s_size:
if s[i] == c:
s.erase(i, 1)
s_size -= 1
else:
i += 1
b
below is the byte array. 下面的
b
是字节数组。 Try to call .decode
to convert it to string. 尝试调用
.decode
将其转换为字符串。
b = b'ciao\0pippo\0'
print(b)
cy_func(b.decode('ASCII'))
I don't know why Cython produces code it is producing - there is even no erase
in string.pxd , so Cython should be producing an error. 我不知道为什么用Cython产生它是生产代码-甚至有没有
erase
在string.pxd ,所以用Cython应该产生一个错误。
The easiest workaround would be to introduce a function erase
which wrapps std::string::erase
: 最简单的解决方法是引入一个函数
erase
,该函数封装std::string::erase
:
cdef extern from *:
"""
#include <string>
std::string &erase(std::string& s, size_t pos, size_t len){
return s.erase(pos, len);
}
"""
string& erase(string& s, size_t pos, size_t len)
# replace s.erase(i,1) -> erase(s,i,1)
However, it is not how erasing zeros should be done in C++: it is buggy (see @MS answer for a fix) and it has O(n^2)
running time (just try it on b"\\x00"*10**6
), the right way is to use remove/erase-idiom : 但是,这不是在C ++中应该如何擦除零的方法:它有问题(请参阅@MS 答案以获取修复)并且运行时间为
O(n^2)
(只需在b"\\x00"*10**6
上尝试一下即可) b"\\x00"*10**6
),正确的方法是使用remove / erase-idiom :
%%cython --cplus
from libcpp.string cimport string
cdef extern from *:
"""
#include <string>
#include <algorithm>
void remove_nulls(std::string& s){
s.erase(std::remove(s.begin(), s.end(), 0), s.end());
}
"""
void remove_nulls(string& s)
cdef string my_func(string s):
remove_nulls(s)
return s
which is hard to misuse and is O(n)
. 这很难被滥用,并且是
O(n)
。
One more remark, concerning passing `std::string' per value. 再说一遍,关于每个值传递`std :: string'。 The signature:
签名:
cdef string my_func(string s)
...
return s
means, there are two (unnecessary) copies (with RVO being impossible), it might be better to avoid and pass s
by reference (at least in cdef
-functions): 意味着有两个(不必要的)副本(不可能使用RVO),最好避免引用并传递
s
(至少在cdef
-functions中):
def cy_func(string b):
remove_nulls(b) # no copying
return b
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.