简体   繁体   English

UnicodeDecodeError:“ ascii”编解码器无法解码字节0xe4

[英]UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4

I'm wondering if someone could help me out, I've tried searching beforehand but I'm unable to find an answer: 我想知道是否有人可以帮助我,我已经尝试过事先搜索,但找不到答案:

I have a file called info.dat which contains: 我有一个名为info.dat的文件,其中包含:

#
# *** Please be aware that the revision numbers on the control lines may not always
# *** be 1 more than the last file you received. There may have been additional
# *** increments in between.
#
$001,427,2018,04,26
#
# Save this file as info.dat
#

I'm trying to loop the file, get the version number and write that to its own file 我正在尝试循环文件,获取版本号并将其写入其自己的文件

with open('info.dat', 'r') as file:
    for line in file:
        if line.startswith('$001,'):
            with open('version.txt', 'w') as w:
                version = line[5:8] # Should be 427
                w.write(version + '\n')
                w.close()

While this does write the correct info, I keep getting the following error: 尽管这确实写入了正确的信息,但我不断收到以下错误:

Traceback (most recent call last):
File "~/Desktop/backup/test.py", line 4, in <module>
for line in file:
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 6281: ordinal not in range(128)

When trying to add the following 尝试添加以下内容时

with open('info.dat', 'r') as file:
    for line in file:
        if line.startswith('$001,'):
            with open('version.txt', 'w') as w:
                version = line[5:8]
                # w.write(version.encode('utf-8') + '\n')
                w.write(version.decode() + '\n')
                w.close()

I get the following error 我收到以下错误

Traceback (most recent call last):
File "~/Desktop/backup/test.py", line 9, in <module>
w.write(version.encode('utf-8') + '\n')
TypeError: can't concat str to bytes

You're trying to open a text file, which will implicitly decode each line with your default encoding, then manually re-encode each line with UTF-8, then write it to a text file, which will implicitly decode that UTF-8 back using your default encoding again. 您正在尝试打开一个文本文件,该文件将使用默认编码对每行进行隐式解码,然后使用UTF-8手动对每行进行重新编码,然后将其写入文本文件,这将对UTF-8进行隐式解码再次使用您的默认编码。 That isn't going to work. 那是行不通的。 But the good news is, the right thing to do is a lot simpler. 但好消息是, 正确的做法要简单得多。


If you know the input file is in UTF-8 (which it probably isn't—see below), just open the files as UTF-8 instead of as your default encoding: 如果您知道输入文件为UTF-8(可能不是,请参见下文),则只需以UTF-8而不是默认编码打开文件即可:

with open('info.dat', 'r', encoding='utf-8') as file:
    for line in file:
        if line.startswith('$001,'):
            with open('version.txt', 'w', encoding='utf-8') as w:
                version = line[5:8] # Should be 427
                w.write(version + '\n')
                w.close()

In fact, I'm pretty sure your files are not in UTF-8, but Latin-1 (in Latin-1, \\xa3 is ä ; in UTF-8, it's the start of a 3-byte sequence that probably encodes a CJK character). 实际上,我非常确定您的文件不是 UTF-8,而是Latin-1(在Latin-1中, \\xa3ä ;在UTF-8中,这是一个3字节序列的开始,该序列可能会编码一个CJK字符)。 If so, you can do the same thing with the right encoding instead of the wrong one, and now it will work. 如果是这样,您可以使用正确的编码而不是错误的编码执行相同的操作,现在它可以工作。


But if you have no idea what the encoding is, don't try to guess; 但是,如果您不知道编码是什么,请不要猜测。 just stick with binary mode. 只是坚持使用二进制模式。 This means passing rb and wb modes instead of r and w , and using bytes literals: 这意味着传递rbwb模式而不是rw ,并使用bytes常量:

with open('info.dat', 'rb') as file:
    for line in file:
        if line.startswith(b'$001,'):
            with open('version.txt', 'wb') as w:
                version = line[5:8] # Should be 427
                w.write(version + b'\n')
                w.close()

Either way, no need to call encode or decode anywhere; 无论哪种方式,都无需在任何地方调用encodedecode just let the file objects take care of it for you, and deal with only a single type (whether str or bytes ) everywhere. 只需让文件对象为您处理,并在任何地方都只处理一种类型(无论是str还是bytes )。

encode()返回字节,但“ \\ n”是字符串,您需要将字符串形式的字节转换为字节+字节,因此请尝试此操作

w.write(version.encode('utf-8') + b'\n')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 UnicodeDecodeError:&#39;utf-8&#39;编解码器无法解码位置33的字节0xe4:无效的连续字节 - UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 33: invalid continuation byte UnicodeDecodeError: &#39;utf-8&#39; 编解码器无法解码位置 1 中的字节 0xe4:Django 中的连续字节无效 - UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 1: invalid continuation byte in Django UnicodeDecodeError:&#39;utf8&#39;编解码器无法解码位置4的字节0xe4:无效的连续字节 - UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 4: invalid continuation byte UnicodeDecodeError:“ ascii”编解码器无法解码字节0xe3 - UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 python德语umlaut问题-&#39;ascii&#39;编解码器无法解码位置15的字节0xe4:序数不在范围内(128) - python german umlaut issues - 'ascii' codec can't decode byte 0xe4 in position 15: ordinal not in range(128) UnicodeDecodeError:“ ascii”编解码器无法解码字节 - UnicodeDecodeError: 'ascii' codec can't decode byte Python 3 UnicodeDecodeError:&#39;ascii&#39;编解码器无法解码位置0中的字节0xe2:序数不在范围内(128) - Python 3 UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128) UnicodeDecodeError:&#39;ascii&#39;编解码器无法解码位置35的字节0xe2:序数不在范围内(128) - UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 35: ordinal not in range(128) UnicodeDecodeError:&#39;ascii&#39;编解码器无法解码位置4中的字节0xe2:序数不在范围内(128) - UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4: ordinal not in range(128) UnicodeDecodeError:&#39;ascii&#39;编解码器无法解码位置20的字节0xe9:序数不在范围内(128) - UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 20: ordinal not in range(128)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM