简体   繁体   中英

Python str to C++ to Python str

I am struggling with converting from Python str to C++ and back. For Python 2/3 compatibility, I thought using str/bytes for Py2/3, respectively, would suffice (the defines).

Note this is extracted from a larger codebase; apologies for any missing imports.

// C++ stuff compiled to convertor.so
#include "Python.h"
#if PY_MAJOR_VERSION >= 3
    #define PyString_Size PyBytes_Size
    #define PyString_AsString PyBytes_AsString
    #define PyString_FromStringAndSize PyBytes_FromStringAndSize
#endif

template<typename T>
struct vec {
  T *ptr;
  i64 size;
};

extern "C"
vec<uint8_t> str_to_char_arr(PyObject* in) {
  int64_t dimension = (int64_t) PyString_Size(in);
  vec<uint8_t> t;
  t.size = dimension;
  t.ptr = (uint8_t*) PyString_AsString(in);
  return t;
}

extern "C"
PyObject* char_arr_to_str(vec<uint8_t> inp) {
  Py_Initialize();
  PyObject* buffer = PyString_FromStringAndSize((const char*) inp.ptr, inp.size);
  return buffer;
}


# Python stuff
class Vec(Structure):
    _fields_ = [
        ("ptr", POINTER(c_wchar_p)),
        ("size", c_long),
    ]

lib = to_shared_lib('convertor')
lib_file = pkg_resources.resource_filename(__name__, lib)
utils = ctypes.PyDLL(lib_file)

str_to_char_arr = utils.str_to_char_arr
str_to_char_arr.restype = Vec()
str_to_char_arr.argtypes = [py_object]

encoded = str_to_char_arr('abc'.encode('utf-8'))

char_arr_to_str = utils.char_arr_to_str
char_arr_to_str.restype = py_object
char_arr_to_str.argtypes = [py_object.ctype_class]
result = ctypes.cast(encoded, ctypes.POINTER(Vec())).contents

decoded = char_arr_to_str(result).decode('utf-8')

Trying this with 'abc' on python 3.5 seems to yield '\\x03\\x00\\x00' which clearly means something went wrong.

Can anyone spot the issue?

It might be that you expect UCS2 and the Python is configured for UCS4. See also Building an UCS4 string buffer in python 2.7 ctypes

Haven't managed to make this work for Python 2; perhaps someone understands the unicode/str/bytes differences better between the Python versions to fix this. Also this means the issue I have is probably with another package which unfortunately I have no control of atm.

Nevertheless, here is some working code (for me) with Python 3.5 and clang 6.0.

#include "Python.h"

#if PY_MAJOR_VERSION >= 3
    #define PyString_Size PyBytes_Size
    #define PyString_AsString PyBytes_AsString
    #define PyString_FromStringAndSize PyBytes_FromStringAndSize
#endif

template<typename T>
struct vec {
  T *ptr;
  int64_t size;
};

extern "C"
vec<uint8_t> str_to_char_arr(PyObject* in) {
  int64_t dimension = (int64_t) PyString_Size(in);
  vec<uint8_t> t;
  t.size = dimension;
  t.ptr = (uint8_t*) PyString_AsString(in);
  return t;
}

extern "C"
PyObject* char_arr_to_str(vec<uint8_t> inp) {
  Py_Initialize();
  PyObject* buffer = PyString_FromStringAndSize((const char*) inp.ptr, inp.size);
  return buffer;
}


# Python
from ctypes import *

import pkg_resources


class Vec(Structure):
    _fields_ = [
        ("ptr", POINTER(c_char_p)),
        ("size", c_long),
    ]


lib = 'test.so'
lib_file = pkg_resources.resource_filename(__name__, lib)
utils = PyDLL(lib_file)

str_to_char_arr = utils.str_to_char_arr
str_to_char_arr.restype = Vec
str_to_char_arr.argtypes = [py_object]

encoded = str_to_char_arr('Bürgermeister'.encode('utf-8'))

char_arr_to_str = utils.char_arr_to_str
char_arr_to_str.restype = py_object
char_arr_to_str.argtypes = [Vec]

decoded = char_arr_to_str(encoded).decode('utf-8')
print(decoded)  # Bürgermeister

Changing c_char_p to c_wchar_p seems to have no effect(?). Still works.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM