简体   繁体   English

在Cython内部使用C ++的`str.erase()`

[英]Using C++'s `str.erase()` from within Cython

I am implemeting a function in Cython that requires, at some point to remove some char from a C++ std::string . 我在Cython中实现了一个功能,该功能要求在某个时候从C ++ std::string删除一些char For this, I would use std::string::erase() . 为此,我将使用std::string::erase() However, when I try to use it, Cython forces the object to be bytes() instead of std::string() , at which point it cannot find .erase() . 但是,当我尝试使用它时,Cython强制对象为bytes()而不是std::string() ,此时它无法找到.erase()

To illustrate the issue, here is a minimal example (using IPython + Cython magic): 为了说明这个问题,这是一个最小的示例(使用IPython + Cython魔术):

%load_ext Cython
%%cython --cplus -c-O3 -c-march=native -a


from libcpp.string cimport string


cdef string my_func(string s):
    cdef char c = b'\0'
    cdef size_t s_size = s.length()
    cdef size_t i = 0
    while i + 1 <= s_size:
        if s[i] == c:
            s.erase(i, 1)
        i += 1
    return s


def cy_func(string b):
    return my_func(b)

This compiles, but it indicates Python interaction on the .remove() line, and when I try to use it, eg 这可以编译,但是它表明在.remove()行以及我尝试使用它时的Python交互,例如

b = b'ciao\0pippo\0'
print(b)
cy_func(b)

I get: 我得到:

AttributeError Traceback (most recent call last) AttributeError: 'bytes' object has no attribute 'erase' AttributeError追溯(最近一次通话最近)AttributeError:“字节”对象没有属性“擦除”

Exception ignored in: '_cython_magic_5beaeb4004c3afc6d85b9b158c654cb6.my_func' AttributeError: 'bytes' object has no attribute 'erase' 在以下情况中忽略异常:'_cython_magic_5beaeb4004c3afc6d85b9b158c654cb6.my_func'AttributeError:'bytes'对象没有属性'erase'

How could I solve this? 我该如何解决?

Notes 笔记

  1. If I replace the s.erase(i, 1) with say s[i] == 10 , I get my_func() with no Python interaction (can even use the nogil directive). 如果我用s[i] == 10替换s.erase(i, 1) ,我将得到没有Python交互作用的my_func() (甚至可以使用nogil指令)。
  2. I know I could this in Python with .replace(b'\\0', b'') , but it is part of a longer algorithm I hope to optimize with Cython. 我知道我可以使用.replace(b'\\0', b'')在Python中做到这一点,但这是我希望使用Cython优化的更长算法的一部分。

You get access after the array bounds. 您可以在数组边界之后访问。 Fix it, and your code will be working. 修复它,您的代码将起作用。

The length of the string is decreased after erase . erase后,字符串的长度减小。 Also the condition i < s_size looks better than i + 1 <= s_size . 而且条件i < s_size看起来比i + 1 <= s_size Finally, i must not be incremented after erase , the new char comes to that index. 最后,在erase之后, i不能再递增,新的char进入该索引。

while i < s_size:
    if s[i] == c:
        s.erase(i, 1)
        s_size -= 1
    else:
        i += 1

b below is the byte array. 下面的b是字节数组。 Try to call .decode to convert it to string. 尝试调用.decode将其转换为字符串。

b = b'ciao\0pippo\0'
print(b)
cy_func(b.decode('ASCII'))

I don't know why Cython produces code it is producing - there is even no erase in string.pxd , so Cython should be producing an error. 我不知道为什么用Cython产生它是生产代码-甚至有没有erasestring.pxd ,所以用Cython应该产生一个错误。

The easiest workaround would be to introduce a function erase which wrapps std::string::erase : 最简单的解决方法是引入一个函数erase ,该函数封装std::string::erase

cdef extern from *:
    """
    #include <string>
    std::string &erase(std::string& s, size_t pos, size_t len){
        return s.erase(pos, len);
    }
    """
    string& erase(string& s, size_t pos, size_t len)

# replace  s.erase(i,1) -> erase(s,i,1)

However, it is not how erasing zeros should be done in C++: it is buggy (see @MS answer for a fix) and it has O(n^2) running time (just try it on b"\\x00"*10**6 ), the right way is to use remove/erase-idiom : 但是,这不是在C ++中应该如何擦除零的方法:它有问题(请参阅@MS 答案以获取修复)并且运行时间为O(n^2) (只需在b"\\x00"*10**6上尝试一下即可) b"\\x00"*10**6 ),正确的方法是使用remove / erase-idiom

%%cython --cplus
from libcpp.string cimport string

cdef extern from *:
    """
    #include <string>
    #include <algorithm>
    void remove_nulls(std::string& s){
       s.erase(std::remove(s.begin(), s.end(), 0), s.end());
    }
    """
    void remove_nulls(string& s)


cdef string my_func(string s):
    remove_nulls(s)
    return s

which is hard to misuse and is O(n) . 这很难被滥用,并且是O(n)


One more remark, concerning passing `std::string' per value. 再说一遍,关于每个值传递`std :: string'。 The signature: 签名:

cdef string my_func(string s)
     ...
     return s

means, there are two (unnecessary) copies (with RVO being impossible), it might be better to avoid and pass s by reference (at least in cdef -functions): 意味着有两个(不必要的)副本(不可能使用RVO),最好避免引用并传递s (至少在cdef -functions中):

def cy_func(string b):
    remove_nulls(b)  # no copying
    return b

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM