在 Python3 中解码字节字符串时出错 [TypeError: must be str, not bytes]

Question

I'm trying to use in Python 3.6 a piece of code written for Python 2.7, and I'm having trouble managing differences in how byte strings are handled.我试图在 Python 3.6 中使用一段为 Python 2.7 编写的代码，但我在管理字节字符串处理方式的差异时遇到了麻烦。 The code is meant to read a .dat file that existed before I wrote my code.该代码旨在读取在我编写代码之前存在的 .dat 文件。 Running the untouched P2.7 script returns the following error:运行未修改的 P2.7 脚本会返回以下错误：

import numpy as np

buff = ''
dt = np.dtype([('var1', np.uint32, 1), ('var2', np.uint8, 1)])

with open(filename, 'rb') as f:
    for line in f:
        dat = line
--->    buff += dat

    data = np.frombuffer(buffer=buff, dtype=dt)

TypeError: must be str, not bytes

If I get it right, while Python2 will concatenate the read bytes into the string buff without complaining, Python3 cares about the difference between bytes and strings.如果我做对了，虽然 Python2 会毫无怨言地将读取的字节连接到字符串 buff 中，但 Python3 关心字节和字符串之间的区别。 Typecasting line to str(line) returns the following error:将 line 类型转换为 str(line) 会返回以下错误：

    for line in f:
        dat = str(line)
        buff += dat
->  data = np.frombuffer(buffer=buff, dtype=dt)

AttributeError: 'str' object has no attribute '__buffer__'

How should I go about it?我该怎么办？ What type should buff be? buff应该是什么类型的？ Any solutions that would work for P2.7 and P3.6?任何适用于 P2.7 和 P3.6 的解决方案？

EDIT编辑

It turns out the data in filename.dat is not made of unicode strings at all.事实证明 filename.dat 中的数据根本不是由 unicode 字符串组成的。 I've edited the question to remove mention to my mistaken assumption, and I've added lines of code I'd omitted in trying to show a minimal example that I now realize are relevant.我已经编辑了问题以删除对我错误假设的提及，并且我添加了我在试图展示一个我现在意识到相关的最小示例时省略的代码行。 Sorry for the confusion.很抱歉造成混乱。

Answer 1

Use io.BytesIO for your buffer.使用io.BytesIO作为您的缓冲区。 This is compatible with Python 2 and 3, and preferable to str / bytes concatenation for large datasets.这与 Python 2 和 3 兼容，并且优于大型数据集的str / bytes连接。

import io

import numpy as np


buff = io.BytesIO()
dt = np.dtype([('var1', np.uint32, 1), ('var2', np.uint8, 1)])

with open(filename, 'rb') as f:
    for line in f:
        buff.write(line)

    buff.seek(0)
    data = np.frombuffer(buffer=buff.read(), dtype=dt)

在 Python3 中解码字节字符串时出错 [TypeError: must be str, not bytes]

问题描述

1 个解决方案

解决方案1
0 2020-04-01 06:37:49

在 Python3 中解码字节字符串时出错 [TypeError: must be str, not bytes]

问题描述

1 个解决方案

解决方案1 0 2020-04-01 06:37:49

解决方案1
0 2020-04-01 06:37:49