Python 3兼容性问题

Question

Description of problem 问题描述

I have to migrate some code to Python 3. The compilation terminated with success. 我必须将一些代码迁移到Python3。编译成功终止。 But I have a problem on the runtime: 但是我在运行时遇到问题：

static PyObject* Parser_read(PyObject * const self, PyObject * unused0, PyObject * unused1) {
    //Retrieve bytes from the underlying data stream.
    //In this case, an iterator
    PyObject * const i = PyIter_Next(self->readIterator);

    //If the iterator returns NULL, then no more data is available.
    if(i == NULL)
    {
        Py_RETURN_NONE;
    }

    //Treat the returned object as just bytes
    PyObject * const bytes = PyObject_Bytes(i);

    Py_DECREF(i);

    if( not bytes )
    {
        //fprintf(stderr, "try to read %s\n", PyObject_Str(bytes));
        PyErr_SetString(PyExc_ValueError, "iterable must return bytes like objects");
        return NULL;

    }

    ....
}

In my python code, I have something like that: 在我的python代码中，我有类似的内容：

for data in Parser(open("file.txt")):
   ...

The code works well on Python 2. But on Python 3, I got: 该代码在Python 2上运行良好。但是在Python 3上，我得到了：

ValueError: iterable must return bytes like objects

Update 更新资料

The solution of @casevh works well in all test cases except one: when I wrap the stream: @casevh的解决方案在所有测试用例中都适用，除了以下一种情况：当我包装流时：

def wrapper(stream):
    for data in stream:
        for i in data:
            yield i

for data in Parser(wrapper(open("file.txt", "rb"))):
    ...

and I got: ValueError: iterable must return bytes like objects 我得到了： ValueError：iterable必须返回类似对象的字节

Answer 1

One option is to open the file in binary mode: 一种选择是以二进制模式打开文件：

open("file.txt", "rb")

That should create an iterator that returns a sequence of bytes. 那应该创建一个返回字节序列的迭代器。

Python 3 strings are assumed to be Unicode and without proper encoding/decoding, they shouldn't be interpreted as a sequence of bytes. 假定Python 3字符串是Unicode，并且没有正确的编码/解码，因此不应将它们解释为字节序列。 If you are reading plain ASCII text, and not a binary data stream, you could also convert from Unicode to ASCII. 如果您正在读取纯ASCII文本，而不是二进制数据流，则还可以从Unicode转换为ASCII。 See PyUnicode_AsASCIIString() and related functions. 请参阅PyUnicode_AsASCIIString()和相关函数。

Answer 2

As noted by @casevh, in Python you need to decide whether your data is binary or text. 如@casevh所述，在Python中，您需要确定数据是二进制还是文本。 The fact that you are iterating lines makes me think that the latter is the case. 您正在迭代行的事实使我认为后者就是这种情况。

def wrapper(stream):
    for data in stream:
        for i in data:
            yield i

works in Python 2, because iterating a str will yield 1-character strings; 在Python 2中有效，因为迭代str会产生1个字符的字符串； in Python 3, iterating over a bytes object will yield individual bytes that are integers in range 0 - 255 . 在Python 3中，迭代一个bytes对象将产生单个字节，这些字节是0到255范围内的整数 。 You can get the the code work identically in Python 2 and 3 (and identically to the Python 2 behaviour of the code above) by using range and slicing 1 byte/character at a time: 通过使用范围并一次切片1个字节/字符，可以使代码在Python 2和3中完全相同（并且与上述代码的Python 2行为相同）：

def wrapper(stream):
    for data in stream:
        for i in range(len(data)):
            yield data[i:i + 1]

PS You also have a mistake in your C extension code: Parser_read takes 3 arguments, 2 of which are named unused_x . PS您的C扩展代码中也有一个错误： Parser_read 3个参数，其中2个被命名为unused_x 。 Only a method annotated with METH_KEYWORDS takes 3 arguments ( PyCFunctionWithKeywords ); 只有带有METH_KEYWORDS注释的方法带有3个参数（ PyCFunctionWithKeywords ）; all others, including METH_NOARGS must be functions taking 2 arguments ( PyCFunction ). 所有其他函数（包括METH_NOARGS必须是METH_NOARGS 2个参数的函数（ PyCFunction ）。

Python 3兼容性问题

问题描述

2 个解决方案

解决方案1
3 已采纳 2015-03-10 13:14:35

解决方案2
0 2015-03-10 15:37:39

Python 3兼容性问题

问题描述

2 个解决方案

解决方案1 3 已采纳 2015-03-10 13:14:35

解决方案2 0 2015-03-10 15:37:39

解决方案1
3 已采纳 2015-03-10 13:14:35

解决方案2
0 2015-03-10 15:37:39