[英]Python reading unicode from local files
I am trying to read some unicode files that I have locally. 我试图读取我本地的一些unicode文件。 How do I read unicode files while using a list?
如何在使用列表时读取unicode文件? I've read the python docs, and a ton of stackoverflow Q&A's, which have answered a lot of other questions I had, but I can't find the answer to this one.
我已经阅读了python文档,以及大量的stackoverflow Q&A,它已经回答了我的许多其他问题,但我找不到这个问题的答案。
Any help is appreciated. 任何帮助表示赞赏。
Edit: Sorry, my files are in utf-8. 编辑:对不起,我的文件是utf-8。
You can open UTF-8-encoded files by using 您可以使用打开UTF-8编码的文件
import codecs
with codecs.open("myutf8file.txt", encoding="utf-8-sig") as infile:
for line in infile:
# do something with line
Be aware that codecs.open()
does not translate \\r\\n
to \\n
, so if you're working with Windows files, you need to take that into account. 请注意,
codecs.open()
不会将\\r\\n
为\\n
,因此如果您使用的是Windows文件,则需要考虑这一点。
The utf-8-sig
codec will read UTF-8 files with or without a BOM (Byte Order Mark) (and strip it if it's there). utf-8-sig
编解码器将读取带或不带BOM(字节顺序标记)的 UTF-8文件(如果存在则将其剥离)。 On writing, you should use utf-8
as a codec because the Unicode standard recommends against writing a BOM in UTF-8 files . 在编写时,您应该使用
utf-8
作为编解码器,因为Unicode标准建议不要在UTF-8文件中编写BOM 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.