简体   繁体   English

XML写入文件UnicodeDecodeError Python 2.7.3

[英]XML write to file UnicodeDecodeError Python 2.7.3

I've searched the site and haven't found an answer that works for me. 我搜索了网站,但没有找到适合我的答案。 My problem is that I'm trying to write xml to a file and when I run the script from the terminal I get: 我的问题是我正在尝试将xml写入文件,当我从终端运行脚本时,我得到:

Traceback (most recent call last):
File "fetchWiki.py", line 145, in <module>
pageDictionary = qSQL(users_database)
File "fetchWiki.py", line 107, in qSQL
writeXML(listNS)
File "fetchWiki.py", line 139, in writeXML
f1.write(doc.toprettyxml(indent="\t", encoding="utf-8"))       
File "/usr/lib/python2.7/xml/dom/minidom.py", line 57, in toprettyxml
self.writexml(writer, "", indent, newl, encoding)
File "/usr/lib/python2.7/xml/dom/minidom.py", line 1751, in writexml
node.writexml(writer, indent, addindent, newl)
----//---- more lines in here ----//----
self.childNodes[0].writexml(writer, '', '', '')
File "/usr/lib/python2.7/xml/dom/minidom.py", line 1040, in writexml
_write_data(writer, "%s%s%s" % (indent, self.data, newl))
File "/usr/lib/python2.7/xml/dom/minidom.py", line 297, in _write_data
writer.write(data)
File "/usr/lib/python2.7/codecs.py", line 351, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1176: ordinal not
in range(128)

This is from the following code: 这是从下面的代码:

doc = Document()

base = doc.createElement('Wiki')
doc.appendChild(base)

for ns_dict in listNamespaces: 
    namespace = doc.createElement('Namespace')
    base.appendChild(namespace)
    namespace.setAttribute('NS', ns_dict)

    for title in listNamespaces[ns_dict]:

        page = doc.createElement('Page')
        try:
            title.encode('utf8')
            page.setAttribute('Title', title)
        except:
            newTitle = title.decode('latin1', 'ignore')
            newTitle.encode('utf8', 'ignore')
            page.setAttribute('Title', newTitle)

        namespace.appendChild(page)
        text = doc.createElement('Content')
        text_content = doc.createTextNode(listNamespaces[ns_dict][title])
        text.appendChild(text_content)
        page.appendChild(text)

    f1  = open('pageText.xml', 'w')
    f1.write(doc.toprettyxml(indent="\t", encoding="utf-8"))       
    f1.close()

With or without the encode / decode 'igonore' parameter the error occurs. 使用或不使用“编码/解码” igonore”参数,都会发生错误。 Adding 添加

# -*- coding: utf-8 -*- 

does not help. 没有帮助。

I created the python document using Eclipse with Pydoc and it works fine with no problems, but when I run it from the terminal it errors. 我使用Eclipse和Pydoc创建了python文档,它没有任何问题,但是当我从终端运行时它出错了。

Any help is much appreciated including links to answers I did not find. 非常感谢任何帮助,包括我没有找到的答案的链接。

Thanks. 谢谢。

You should not encode the strings you use for attributes. 您不应编码用于属性的字符串。 The minidom library handles those for you when writing. minidom库在编写时会为您处理。

Your error is caused by mixing bytestrings with unicode data, and your encoded bytestrings are not decodable as ASCII. 您的错误是由字节串与unicode数据混合引起的,并且您的编码字节串不能以ASCII格式解码。

If some of your data is encoded, and some of it is in unicode , try to avoid that situation in the first place. 如果您的某些数据是编码的,并且其中一些数据是unicode ,请首先尝试避免这种情况。 If you cannot avoid having to handle mixed data, do this instead: 如果您无法避免必须处理混合数据,请执行以下操作:

page = doc.createElement('Page')
if not isinstance(title, unicode):
    title = title.decode('latin1', 'ignore')
page.setAttribute('Title', title)

Note that you don't need to use doc.toprettyxml() ; 请注意,您不需要使用doc.toprettyxml() ; you can instruct doc.writexml() to indent your XML for you as well: 您可以指示doc.writexml()为您缩进XML:

import codecs
with codecs.open('pageText.xml', 'w', encoding='utf8') as f1:
    doc.writexml(f1, indent='\t', newl='\n')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM