Python將Unicode保存為XML

Question

我目前正在編寫一個簡短的Python腳本，以遍歷服務器上的某些目錄，查找所需內容，並將數據保存到XML文件中。

問題在於某些數據是以其他語言編寫的，例如“ハローワールド”或類似格式的東西。 嘗試將其保存在XML的條目中時，得到以下回溯：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xec in position 18: ordinal not in range(128)

這就是我保存數據的函數的樣子：

def addHistoryEntry(self, title, url):
    self.log.info('adding history entry {"title":"%s", "url":"%s"}' % (title, url))
    root = self.getRoot()
    history = root.find('.//history')
    entry = etree.SubElement(history, 'entry')
    entry.set('title', title)
    entry.set('time', str(unixtime()))
    entry.text = url
    history.set('results', str(int(history.attrib['results']) + 1))
    self.write(root)

self.getRoot()如下：

def getRoot(self):
    return etree.ElementTree(file = self.config).getroot()

這是寫入數據的函數（ self.write(root) ）

def write(self, xmlRoot):
    bump = open(self.config, 'w+')
    bump.write(dom.parseString(etree.tostring(xmlRoot, 'utf-8')).toxml())
    bump.close()

進口是：

import xml.etree.ElementTree as etree
import xml.dom.minidom as dom

如果有人可以幫助我解決此問題，請執行。 謝謝你的幫助。

Answer 1

您可以使用python的編解碼器庫和.decode（'utf-8'）解決該問題。

import sys,codecs
reload(sys)
sys.setdefaultencoding("utf-8")

Answer 2

默認情況下，python會將unicode編碼為ASCII。 由於ASCII僅支持128個字符，因此它將引發超過128個異常值。因此，解決方案是將unicode編碼為擴展格式，例如latin1或UTF8或其他ISO字符集。 我建議在utf-8中編碼。 語法為：

u"Your unucode Text".encode("utf-8", "ignore")

為了你

title.encode("utf-8", "ignore")

Python將Unicode保存為XML

問題描述

2 個解決方案

解決方案1
1 已采納 2014-06-08 03:26:18

解決方案2
0 2014-06-08 00:43:32

Python將Unicode保存為XML

問題描述

2 個解決方案

解決方案1 1 已采納 2014-06-08 03:26:18

解決方案2 0 2014-06-08 00:43:32

解決方案1
1 已采納 2014-06-08 03:26:18

解決方案2
0 2014-06-08 00:43:32