XML中的Python編碼問題

Question

我有一個媒體播放器，我想將正在播放的內容發送到trakt.tv，除標題/路徑中的外來字母外，其他一切都正常。 系統正在運行python 2.7.3

def getStatus(self,ip,timeout=10.0):
    oPchStatus = PchStatus()
    try:
        oResponse = urlopen("http://" + ip + ":8008/playback?arg0=get_current_vod_info",None,timeout)
        oPchStatus = self.parseResponse(oResponse.readlines()[0])
    return oPchStatus

這將返回諸如此類的信息。

<?xml version="1.0"?>
<theDavidBox>
  <request>
    <arg0>get_current_vod_info</arg0>
    <module>playback</module>
  </request>
  <response>
    <currentStatus>pause</currentStatus>
    <currentTime>3190</currentTime>
    <downloadSpeed>0</downloadSpeed>
    <fullPath>/opt/sybhttpd/localhost.drives/HARD_DISK/Storage/NAS/Videos/FILMS/A.Haunted.House.(2013)/A Haunted House.avi</fullPath>
    <lastPacketTime>0</lastPacketTime>
    <mediatype>OTHERS</mediatype>
    <seekEnable>true</seekEnable>
    <title/>
    <totalTime>4860</totalTime>
  </response>
  <returnValue>0</returnValue>
</theDavidBox>

下一步將執行上述操作，並將每個項目分配給一個變量。

class PchStatus:
    def __init__(self):
        self.status=EnumStatus.NOPLAY
        self.fullPath = u""
        self.fileName = u""
        self.currentTime = 0
        self.totalTime = 0
        self.percent = 0
        self.mediaType = ""
        self.currentChapter = 0 # For Blu-ray Disc only
        self.totalChapter = 0 # For Blu-ray Disc only
        self.error = None

class PchRequestor:

    def parseResponse(self, response):
        oPchStatus = PchStatus()
        try:
            response = unescape(response)
            oXml = ElementTree.XML(response)
            if oXml.tag == "theDavidBox": # theDavidBox should be the root
                if oXml.find("returnValue").text == '0' and int(oXml.find("response/totalTime").text) > 90:#Added total time check to avoid scrobble while playing adverts/trailers
                    oPchStatus.totalTime = int(oXml.find("response/totalTime").text)
                    oPchStatus.status = oXml.find("response/currentStatus").text
                    oPchStatus.fullPath = oXml.find("response/fullPath").text
                    oPchStatus.currentTime = int(oXml.find("response/currentTime").text)

等等。 使用上面返回的xml，

oPchStatus.totalTime將為“ 4860” oPchStatus.status將為“暫停” oPchStatus.fullPath將為“ /opt/sybhttpd/localhost.drives/HARD_DISK/Storage/NAS/Videos/FILMS/A.Haunted.House.（2013年）” / A Haunted House.avi“ oPchStatus.currentTime將為” 3190“

就像我說的那樣，在標題中沒有外來字母之前，這很好用。 諸如Le.Fabuleux.Destin.d'Amélie.Poulain。（2001）.avi之類的標題將使oPchStatus.fullPath包含字符串“ /opt/sybhttpd/localhost.drives/HARD_DISK/Storage/NAS/Videos/Le.Fabuleux。 Destin.d'Am \\ xe9lie.Poulain。（2001）.avi“

並不是

“/opt/sybhttpd/localhost.drives/HARD_DISK/Storage/NAS/Videos/Le.Fabuleux.Destin.d'Amélie.Poulain.(2001).avi”

如我所願。

進一步在腳本中，有例程可以掃描xml文件中的文件名並創建FILENAME.watched，因此我需要文件名與實際文件名匹配，而不是替換任何字母。

確保正確編碼這些類型的文件名的最佳方法是什么？ 我試圖提供盡可能多的信息，但是如果您需要更多信息，請詢問。

Answer 1

Python只是通過向您顯示é字符\\xe9的轉義碼，來使您的字符串值可以ASCII打印。

有關鏈接的源代碼的一些說明：

你不應該把你想解析成Unicode的響應。 解析原始字節 。 解析器希望自己解碼內容。 實際上，ElementTree解析器將再次對數據進行編碼 ，以便能夠對其進行解析。
當您在字節串中包含XML時，我將改用ElementTree.fromstring()函數； 是的，它像您一樣使用ElementTree.XML() ，但是fromstring()是已記錄的API 。

否則，您的示例輸入將完全按照應有的方式工作。 如果我根據您的示例在文件路徑中使用非ASCII字符創建XML文檔，則會得到以下信息：

>>> tree = ElementTree.fromstring(response)
>>> print tree.find("response/fullPath").text
/opt/sybhttpd/localhost.drives/HARD_DISK/Storage/NAS/Videos/Le.Fabuleux.Destin.d'Amélie.Poulain.(2001).avi
>>> tree.find("response/fullPath").text
u"/opt/sybhttpd/localhost.drives/HARD_DISK/Storage/NAS/Videos/Le.Fabuleux.Destin.d'Am\xe9lie.Poulain.(2001).avi"

如您所見， .text的unicode()結果包含一個é字符（Unicode代碼點U + 00E9，帶有ACUTE的拉丁文小寫字母E）。 當以Python文字形式打印時，Python通過為我提供該代碼點\\xe9的Python轉義代碼，確保可在ASCII上下文中打印。 這是正常現象 ，沒有任何損壞。

XML中的Python編碼問題

問題描述

1 個解決方案

解決方案1
0 2013-02-17 13:33:35

XML中的Python編碼問題

問題描述

1 個解決方案

解決方案1 0 2013-02-17 13:33:35

解決方案1
0 2013-02-17 13:33:35