UnicodeDecodeError：“ ascii”编解码器无法解码字节0xe4

Question

I'm wondering if someone could help me out, I've tried searching beforehand but I'm unable to find an answer: 我想知道是否有人可以帮助我，我已经尝试过事先搜索，但找不到答案：

I have a file called info.dat which contains: 我有一个名为info.dat的文件，其中包含：

#
# *** Please be aware that the revision numbers on the control lines may not always
# *** be 1 more than the last file you received. There may have been additional
# *** increments in between.
#
$001,427,2018,04,26
#
# Save this file as info.dat
#

I'm trying to loop the file, get the version number and write that to its own file 我正在尝试循环文件，获取版本号并将其写入其自己的文件

with open('info.dat', 'r') as file:
    for line in file:
        if line.startswith('$001,'):
            with open('version.txt', 'w') as w:
                version = line[5:8] # Should be 427
                w.write(version + '\n')
                w.close()

While this does write the correct info, I keep getting the following error: 尽管这确实写入了正确的信息，但我不断收到以下错误：

Traceback (most recent call last):
File "~/Desktop/backup/test.py", line 4, in <module>
for line in file:
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 6281: ordinal not in range(128)

When trying to add the following 尝试添加以下内容时

with open('info.dat', 'r') as file:
    for line in file:
        if line.startswith('$001,'):
            with open('version.txt', 'w') as w:
                version = line[5:8]
                # w.write(version.encode('utf-8') + '\n')
                w.write(version.decode() + '\n')
                w.close()

I get the following error 我收到以下错误

Traceback (most recent call last):
File "~/Desktop/backup/test.py", line 9, in <module>
w.write(version.encode('utf-8') + '\n')
TypeError: can't concat str to bytes

Answer 1

You're trying to open a text file, which will implicitly decode each line with your default encoding, then manually re-encode each line with UTF-8, then write it to a text file, which will implicitly decode that UTF-8 back using your default encoding again. 您正在尝试打开一个文本文件，该文件将使用默认编码对每行进行隐式解码，然后使用UTF-8手动对每行进行重新编码，然后将其写入文本文件，这将对UTF-8进行隐式解码再次使用您的默认编码。 That isn't going to work. 那是行不通的。 But the good news is, the right thing to do is a lot simpler. 但好消息是，正确的做法要简单得多。

If you know the input file is in UTF-8 (which it probably isn't—see below), just open the files as UTF-8 instead of as your default encoding: 如果您知道输入文件为UTF-8（可能不是，请参见下文），则只需以UTF-8而不是默认编码打开文件即可：

with open('info.dat', 'r', encoding='utf-8') as file:
    for line in file:
        if line.startswith('$001,'):
            with open('version.txt', 'w', encoding='utf-8') as w:
                version = line[5:8] # Should be 427
                w.write(version + '\n')
                w.close()

In fact, I'm pretty sure your files are not in UTF-8, but Latin-1 (in Latin-1, \\xa3 is ä ; in UTF-8, it's the start of a 3-byte sequence that probably encodes a CJK character). 实际上，我非常确定您的文件不是 UTF-8，而是Latin-1（在Latin-1中， \\xa3是ä ；在UTF-8中，这是一个3字节序列的开始，该序列可能会编码一个CJK字符）。 If so, you can do the same thing with the right encoding instead of the wrong one, and now it will work. 如果是这样，您可以使用正确的编码而不是错误的编码执行相同的操作，现在它可以工作。

But if you have no idea what the encoding is, don't try to guess; 但是，如果您不知道编码是什么，请不要猜测。 just stick with binary mode. 只是坚持使用二进制模式。 This means passing rb and wb modes instead of r and w , and using bytes literals: 这意味着传递rb和wb模式而不是r和w ，并使用bytes常量：

with open('info.dat', 'rb') as file:
    for line in file:
        if line.startswith(b'$001,'):
            with open('version.txt', 'wb') as w:
                version = line[5:8] # Should be 427
                w.write(version + b'\n')
                w.close()

Either way, no need to call encode or decode anywhere; 无论哪种方式，都无需在任何地方调用encode或decode ； just let the file objects take care of it for you, and deal with only a single type (whether str or bytes ) everywhere. 只需让文件对象为您处理，并在任何地方都只处理一种类型（无论是str还是bytes ）。

Answer 2

encode（）返回字节，但“ \\ n”是字符串，您需要将字符串形式的字节转换为字节+字节，因此请尝试此操作

w.write(version.encode('utf-8') + b'\n')

UnicodeDecodeError：“ ascii”编解码器无法解码字节0xe4

问题描述

2 个解决方案

解决方案1
3 已采纳 2018-05-17 20:21:25

解决方案2
1 2018-05-17 20:10:50

UnicodeDecodeError：“ ascii”编解码器无法解码字节0xe4

问题描述

2 个解决方案

解决方案1 3 已采纳 2018-05-17 20:21:25

解决方案2 1 2018-05-17 20:10:50

解决方案1
3 已采纳 2018-05-17 20:21:25

解决方案2
1 2018-05-17 20:10:50