简体   繁体   English

将 Python 3 Unicode 转换为 std::string 的简洁方法

[英]Clean Way to Convert Python 3 Unicode to std::string

I wrap a lot of C++ using the Python 2 API (I can't use things like swig or boost.python for various technical reasons).我使用 Python 2 API 包装了很多 C++(由于各种技术原因,我不能使用 swig 或 boost.python 之类的东西)。 When I have to pass a string (usually a path, always ASCII) into C/C++, I use something like this:当我必须将一个字符串(通常是一个路径,总是 ASCII)传递到 C/C++ 中时,我使用这样的东西:

std::string file_name = PyString_AsString(py_file_name); 
if (PyErr_Occurred()) return NULL; 

Now I'm considering updating to Python 3, where PyString_* methods don't exist.现在我正在考虑更新到 Python 3,其中PyString_*方法不存在。 I found one solution that says I should do something like this:我找到了一个解决方案,说我应该做这样的事情:

PyObject* bytes = PyUnicode_AsUTF8String(py_file_name);
std::string file_name = PyBytes_AsString(bytes); 
if (PyErr_Occurred()) return NULL; 
Py_DECREF(bytes); 

However this is twice as many lines and seems a bit ugly (not to mention that it could introduce a memory leak if I forget the last line).然而,这是行数的两倍,看起来有点难看(更不用说如果我忘记了最后一行,它可能会导致内存泄漏)。

The other option is to redefine the python functions to operate on bytes objects, and to call them like this另一种选择是重新定义 python 函数来操作bytes对象,并像这样调用它们

def some_function(path_name):
    _some_function(path_name.encode('utf8'))

This isn't terrible, but it does require a python-side wrapper for each function.这并不可怕,但它确实需要每个函数的 python 端包装器。

Is there some cleaner way to deal with this?有没有更干净的方法来处理这个问题?

Looks like the solution exists in python 3.3, with char* PyUnicode_AsUTF8(PyObject* unicode) .看起来解决方案存在于 python 3.3 中,带有char* PyUnicode_AsUTF8(PyObject* unicode) This should be exactly the same behavior as the PyString_AsString() function from python 2.这应该与 Python 2 中的PyString_AsString()函数完全相同。

If you know (and of course, you could check with an assert or similar) that it's all ASCII, then you could simply create it like this:如果你知道(当然,你可以用断言或类似的东西来检查)它都是 ASCII,那么你可以简单地像这样创建它:

std::string py_string_to_std_string(PyUnicode_string py_file_name)
{
    len = length of py_file_name;     // Not sure how you write that in python. 
    std::string str(len); 
    for(int i = 0; i < len; i++)
        str += py_file_name[i]; 
    return str;
}

Providing an improved version of accepted answer , instead of using PyUnicode_AsUTF8(...) better to use PyUnicode_AsUTF8AndSize(...) .提供已接受答案的改进版本,而不是使用PyUnicode_AsUTF8(...)更好地使用PyUnicode_AsUTF8AndSize(...)

Becasue string may contain null character (0 codepoint) somewhere in the middle, then your resulting std::string will contain truncated version of full string if you use PyUnicode_AsUTF8(...) .因为字符串可能在中间的某处包含空字符(0 代码点),那么如果您使用PyUnicode_AsUTF8(...) ,则生成的std::string将包含完整字符串的截断版本。

Py_ssize_t size = 0;
char const * pc = PyUnicode_AsUTF8AndSize(obj, &size);
std::string s;
if (pc)
    s = std::string(pc, size);
else
    // Error, handle!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM