简体   繁体   中英

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4

I'm wondering if someone could help me out, I've tried searching beforehand but I'm unable to find an answer:

I have a file called info.dat which contains:

#
# *** Please be aware that the revision numbers on the control lines may not always
# *** be 1 more than the last file you received. There may have been additional
# *** increments in between.
#
$001,427,2018,04,26
#
# Save this file as info.dat
#

I'm trying to loop the file, get the version number and write that to its own file

with open('info.dat', 'r') as file:
    for line in file:
        if line.startswith('$001,'):
            with open('version.txt', 'w') as w:
                version = line[5:8] # Should be 427
                w.write(version + '\n')
                w.close()

While this does write the correct info, I keep getting the following error:

Traceback (most recent call last):
File "~/Desktop/backup/test.py", line 4, in <module>
for line in file:
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 6281: ordinal not in range(128)

When trying to add the following

with open('info.dat', 'r') as file:
    for line in file:
        if line.startswith('$001,'):
            with open('version.txt', 'w') as w:
                version = line[5:8]
                # w.write(version.encode('utf-8') + '\n')
                w.write(version.decode() + '\n')
                w.close()

I get the following error

Traceback (most recent call last):
File "~/Desktop/backup/test.py", line 9, in <module>
w.write(version.encode('utf-8') + '\n')
TypeError: can't concat str to bytes

You're trying to open a text file, which will implicitly decode each line with your default encoding, then manually re-encode each line with UTF-8, then write it to a text file, which will implicitly decode that UTF-8 back using your default encoding again. That isn't going to work. But the good news is, the right thing to do is a lot simpler.


If you know the input file is in UTF-8 (which it probably isn't—see below), just open the files as UTF-8 instead of as your default encoding:

with open('info.dat', 'r', encoding='utf-8') as file:
    for line in file:
        if line.startswith('$001,'):
            with open('version.txt', 'w', encoding='utf-8') as w:
                version = line[5:8] # Should be 427
                w.write(version + '\n')
                w.close()

In fact, I'm pretty sure your files are not in UTF-8, but Latin-1 (in Latin-1, \\xa3 is ä ; in UTF-8, it's the start of a 3-byte sequence that probably encodes a CJK character). If so, you can do the same thing with the right encoding instead of the wrong one, and now it will work.


But if you have no idea what the encoding is, don't try to guess; just stick with binary mode. This means passing rb and wb modes instead of r and w , and using bytes literals:

with open('info.dat', 'rb') as file:
    for line in file:
        if line.startswith(b'$001,'):
            with open('version.txt', 'wb') as w:
                version = line[5:8] # Should be 427
                w.write(version + b'\n')
                w.close()

Either way, no need to call encode or decode anywhere; just let the file objects take care of it for you, and deal with only a single type (whether str or bytes ) everywhere.

encode()返回字节,但“ \\ n”是字符串,您需要将字符串形式的字节转换为字节+字节,因此请尝试此操作

w.write(version.encode('utf-8') + b'\n')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM