简体   繁体   中英

Error decoding byte string in Python3 [TypeError: must be str, not bytes]

I'm trying to use in Python 3.6 a piece of code written for Python 2.7, and I'm having trouble managing differences in how byte strings are handled. The code is meant to read a .dat file that existed before I wrote my code. Running the untouched P2.7 script returns the following error:

import numpy as np

buff = ''
dt = np.dtype([('var1', np.uint32, 1), ('var2', np.uint8, 1)])

with open(filename, 'rb') as f:
    for line in f:
        dat = line
--->    buff += dat

    data = np.frombuffer(buffer=buff, dtype=dt)

TypeError: must be str, not bytes

If I get it right, while Python2 will concatenate the read bytes into the string buff without complaining, Python3 cares about the difference between bytes and strings. Typecasting line to str(line) returns the following error:

    for line in f:
        dat = str(line)
        buff += dat
->  data = np.frombuffer(buffer=buff, dtype=dt)

AttributeError: 'str' object has no attribute '__buffer__'

How should I go about it? What type should buff be? Any solutions that would work for P2.7 and P3.6?

EDIT

It turns out the data in filename.dat is not made of unicode strings at all. I've edited the question to remove mention to my mistaken assumption, and I've added lines of code I'd omitted in trying to show a minimal example that I now realize are relevant. Sorry for the confusion.

Use io.BytesIO for your buffer. This is compatible with Python 2 and 3, and preferable to str / bytes concatenation for large datasets.

import io

import numpy as np


buff = io.BytesIO()
dt = np.dtype([('var1', np.uint32, 1), ('var2', np.uint8, 1)])

with open(filename, 'rb') as f:
    for line in f:
        buff.write(line)

    buff.seek(0)
    data = np.frombuffer(buffer=buff.read(), dtype=dt)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM