简体   繁体   English

python xml.dom中的XML字符

[英]XML characters in python xml.dom

I am working on producing an xml document from python. 我正在从python生成xml文档。 We are using the xml.dom package to create the xml document. 我们正在使用xml.dom包来创建xml文档。 We are having a problem where we want to produce the character φ 我们在想产生字符φ时遇到问题。 which is a φ. 这是一个φ。 However, when we put that string in a text node and call toxml() on it we get φ. 但是,当我们将该字符串放入文本节点并在其上调用toxml()时,我们得到φ。 Our current solution is to use saxutils.unescape() on the result of toxml() but this is not ideal because we will have to parse the xml twice. 我们当前的解决方案是在toxml()的结果上使用saxutils.unescape(),但这并不理想,因为我们必须将xml解析两次。

Is there someway to get the dom package to recognize "φ" 是否有某种方法可以使dom包识别“φ” as an xml character? 作为xml字符?

I think you need to use a Unicode string with in it, because the .data field of a text node is supposed (as far as I understand) to be "parsed" data, not including XML entities (whence the & when made back into XML). 我认为您需要使用其中带有的Unicode字符串,因为据我所知,文本节点的.data字段被认为是“已解析”数据,不包括XML实体(因此& when重新制作成XML)。 If you want to ensure that, on output, non-ascii characters are expressed as entities, you could do: 如果要确保在输出时将非ASCII字符表示为实体,则可以执行以下操作:

import codecs

def ent_replace(exc):
  if isinstance(exc, (UnicodeEncodeError, UnicodeTranslateError)):
    s = []
    for c in exc.object[exc.start:exc.end]:
      s.append(u'&#x%4.4x;' % ord(c))
    return (''.join(s), exc.end)
  else:
    raise TypeError("can't handle %s" % exc.__name__)

codecs.register_error('ent_replace', ent_replace)

and use x.toxml().encode('ascii', 'ent_replace') . 并使用x.toxml().encode('ascii', 'ent_replace')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM