简体   繁体   中英

Boost-Python: Expose a class to Python which is a subclass of a Python class (str)

I am trying to have a Boost Python function return a Python class which is a subclass of a Python builtin class (here str ):

My first method involves creating the class in a Python module, mystr.py :

class MyStr(str):
    def __truediv__(self, other):
        return self + other

I then import that module using Boost, and then to return a python object of that type I use somehting along these lines in C++, importing the module and calling py::exec :

py::object AsMyStr(std::string const &s)
{
    py::object my_str = py::import("mystr");
    py::dict my_namespace(my_str.attr("__dict__"));
    
    my_namespace["_MYSTR_test"] = s;
    py::exec(
        "_MYSTR_test = MyStr(_MYSTR_test)\n",
        my_namespace, my_namespace);
    return my_namespace["_MYSTR_test"];
}

Exposing this funtion in a Boost-Python module, this correctly gives me a MyStr instance on the Python side, which can be used accordingly:

 a = AsMyStr("Hello")
 b = " World"
 print(a / b)
 # "Hello World"

I just wonder if the subclassing of str can be done on the Boost-Python side of things in C++. I cannot manage to get __truediv__ to work in that case:

class MyStr : public py::str
{
public:
    MyStr(py::object const &o) : py::str(o)

    MyStr __truediv__(other)
    {
         return MyStr(*this + other);
    }
 }

Exposing it as a module

 BOOST_PYTHON_MODULE(MyStr)
 {
     py::class_<MyStr, py::bases<py::str>>("MyStr", py::no_init)
         .def(py::init<py::object const &>())
         .def("__truediv__", &MyStr::__truediv__)
         ;
 }

But using this class on the Python side leads to:

 a = MyStr("Hello")
 b = " World"
 print(a / b)
 # ValueError: character U+5555eaa0 is not in range [U+0000; U+10ffff]

How do I have to define and expose the class MyStr in the C++ implementation to return on the Python side a "true" MyStr which is a subclass of str ?


I uploaded the code to https://gitlab.com/kohlrabi/learn-boost-python , the branch master contains the first solution, the branch cpp_class the second, non-working solution.

The range U+0000 to U+10fffff represents all possible Unicode code points .

Your string is likely to be encoded between C++ and Python, so you can try an encoding as in .decode('cp1252') . Or you can do .decode('utf-8', 'surrogatepass') and the bad characters will show as undecoded bytes in the resulting string.

Change surrogatepass to replace and they become question marks and change to ignore and they disappear.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM