简体   繁体   English

我必须使用哪种python编码类型来读取非utf-8字符?

[英]Which python encoding type I must use to read a non utf-8 character?

I have to make my python script read a DNA query strings file and do a search with it. 我必须使我的python脚本读取DNA查询字符串文件并使用它进行搜索。

Well, the file contains this type of character: 好吧,文件包含这种类型的字符:

屏幕截图

And python default encoding cannot read this line with the readline() function for files. 并且python默认编码无法使用文件的readline()函数读取此行。 The following error is raised: 引发以下错误:

[...]
File "/usr/lib/python3.4/codecs.py", line 319, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 860: invalid start byte

I have tried with utf_16 and ascii too but with no positive results. 我也尝试过utf_16和ascii,但是没有积极的结果。 How can I read this? 我该怎么读?

You need to first figure out what the actual encoding of the text file you have to read, then use open with that file and the correct encoding argument to open that. 您需要首先弄清楚您必须阅读的文本文件的实际编码,然后将open与该文件一起使用,并使用正确的encoding参数将其打开。 The diamond ? 钻石? is simply a placeholder character in your console so your default system encoding is incompatible with the file you have displayed (and vice versa). 在控制台中只是一个占位符,因此您的默认系统编码与您显示的文件不兼容(反之亦然)。

Alternatively if you do not care about the "junk" characters you can simply 'ignore' or 'replace' for the errors argument. 另外,如果您不关心“垃圾”字符,则可以简单地'ignore''replace' errors参数。 Again please consult the documentation first for options available. 同样,请先查阅文档以获取可用选项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM