简体   繁体   English

读取文件时的Python编码问题

[英]Python encoding issue while reading a file

I am trying to read a file that contains this character in it "ë".我正在尝试读取一个包含此字符“ë”的文件。 The problem is that I can not figure out how to read it no matter what I try to do with the encoding.问题是,无论我尝试使用编码做什么,我都无法弄清楚如何阅读它。 When I manually look at the file in textedit it is listed as a unknown 8-bit file.当我在 textedit 中手动查看文件时,它被列为未知的 8 位文件。 If I try changing it to utf-8, utf-16 or anything else it either does not work or messes up the entire file.如果我尝试将其更改为 utf-8、utf-16 或其他任何内容,它要么不起作用,要么弄乱整个文件。 I tried reading the file just in standard python commands as well as using codecs and can not come up with anything that will read it correctly.我尝试仅在标准 python 命令中读取文件以及使用编解码器,但无法想出任何可以正确读取它的内容。 I will include a code sample of the read below.我将在下面包含阅读的代码示例。 Does anyone have any clue what I am doing wrong?有谁知道我做错了什么? This is Python 2.17.10 by the way.顺便说一下,这是 Python 2.17.10。

readFile = codecs.open("FileName",encoding='utf-8')

The line I am trying to read is this with nothing else in it.我正在尝试阅读的行是 this ,其中没有其他内容。

Aeëtes

Here are some of the errors I get:以下是我得到的一些错误:

UnicodeDecodeError: 'utf8' codec can't decode byte 0x91 in position 0: invalid start byte UnicodeDecodeError: 'utf8' 编解码器无法解码位置 0 中的字节 0x91:起始字节无效

UTF-16 stream does not start with BOM" UnicodeError: UTF-16 stream does not start with BOM -- I know this one is that it is not a utf-16 file. UTF-16 流不以 BOM 开头” UnicodeError: UTF-16 流不以 BOM 开头——我知道这是它不是 utf-16 文件。

UnicodeDecodeError: 'ascii' codec can't decode byte 0x91 in position 0: ordinal not in range(128) UnicodeDecodeError: 'ascii' 编解码器无法解码位置 0 中的字节 0x91:序号不在范围内 (128)

If I don't use a Codec the word comes in as Ae?tes which then crashes later in the program.如果我不使用编解码器,这个词会以Ae?tes ,然后在程序中稍后崩溃。 Just to be clear, none of the suggested questions or any other anywhere on the net have pointed to an answer.需要明确的是,建议的问题或网络上的任何其他地方都没有指出答案。 One other detail that might help is that I am using OS X, not Windows.另一个可能有帮助的细节是我使用的是 OS X,而不是 Windows。

Credit for this answer goes to RadLexus for figuring out the proper encoding and also to Mad Physicist who pointed me in the right track even if I did not consider all possible encodings.这个答案归功于 RadLexus 找出了正确的编码,也归功于 Mad Physicist,即使我没有考虑所有可能的编码,他也为我指出了正确的方向。

The issue is apparently a Mac will convert the .txt file to mac_roman.问题显然是 Mac 会将 .txt 文件转换为 mac_roman。 If you use that encoding it will work perfectly.如果您使用该编码,它将完美运行。

This is the line of code that I used to convert it.这是我用来转换它的代码行。

readFile = codecs.open("FileName",encoding='mac_roman')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM