简体   繁体   English

如何在Python-3.x中解码文件?

[英]How to decode file in Python-3.x?

For my project I need to parse xml file. 对于我的项目,我需要解析xml文件。 For doing this I use lxml. 为此,我使用lxml。 The file I need to parse has a cp1251 coding, but, ofcourse, for parsing it using lxml I need to decode it into utf-8, and I dont know how to do it. 我需要解析的文件有一个cp1251编码,但是,当然,使用lxml解析它我需要将它解码为utf-8,我不知道该怎么做。 I tryed to serch something about this, but all solutions was for Python 2.7 or didnt work. 我试图解决这个问题,但所有解决方案都适用于Python 2.7或者没有用。 if try to write something like 如果尝试写类似的东西

inp = open("business.xml", "r", encoding='cp1251').decode('utf-8')

or 要么

inp.decode('utf-8')

It gets 它得到了

builtins.AttributeError: '_io.TextIOWrapper' object has no attribute 'decode'

I have Python 3.2. 我有Python 3.2。 Any help is well, thanks you. 任何帮助都很好,谢谢你。

open() decodes the file for you . open() 为您解码文件。 You are already receiving Unicode data. 您已经在接收Unicode数据。

For lxml you need to open the file in binary mode, and let the XML parser deal with encoding. 对于lxml您需要以二进制模式打开文件,并让XML解析器处理编码。 Do not do this yourself. 不要自己这样做。

with open("business.xml", "rb") as inp:
    tree = etree.parse(inp)

XML files include a header to indicate what encoding they use, and the parser adjusts to that. XML文件包含一个标头,用于指示它们使用的编码,并且解析器会对其进行调整。 If the header is missing, the parser can safely assume UTF-8. 如果缺少标头,解析器可以安全地假设为UTF-8。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM