python xml.dom中的XML字符

Question

I am working on producing an xml document from python. 我正在从python生成xml文档。 We are using the xml.dom package to create the xml document. 我们正在使用xml.dom包来创建xml文档。 We are having a problem where we want to produce the character φ 我们在想产生字符＆＃x03c6;时遇到问题。 which is a φ. 这是一个φ。 However, when we put that string in a text node and call toxml() on it we get &#x03c6;. 但是，当我们将该字符串放入文本节点并在其上调用toxml（）时，我们得到＆amp;＃x03c6;。 Our current solution is to use saxutils.unescape() on the result of toxml() but this is not ideal because we will have to parse the xml twice. 我们当前的解决方案是在toxml（）的结果上使用saxutils.unescape（），但这并不理想，因为我们必须将xml解析两次。

Is there someway to get the dom package to recognize "φ" 是否有某种方法可以使dom包识别“＆＃x03c6;” as an xml character? 作为xml字符？

Answer 1

I think you need to use a Unicode string with \φ in it, because the .data field of a text node is supposed (as far as I understand) to be "parsed" data, not including XML entities (whence the & when made back into XML). 我认为您需要使用其中带有\φ的Unicode字符串，因为据我所知，文本节点的.data字段被认为是“已解析”数据，不包括XML实体（因此& when重新制作成XML）。 If you want to ensure that, on output, non-ascii characters are expressed as entities, you could do: 如果要确保在输出时将非ASCII字符表示为实体，则可以执行以下操作：

import codecs

def ent_replace(exc):
  if isinstance(exc, (UnicodeEncodeError, UnicodeTranslateError)):
    s = []
    for c in exc.object[exc.start:exc.end]:
      s.append(u'&#x%4.4x;' % ord(c))
    return (''.join(s), exc.end)
  else:
    raise TypeError("can't handle %s" % exc.__name__)

codecs.register_error('ent_replace', ent_replace)

and use x.toxml().encode('ascii', 'ent_replace') . 并使用x.toxml().encode('ascii', 'ent_replace') 。

python xml.dom中的XML字符

问题描述

1 个解决方案

解决方案1
1 已采纳 2009-05-31 23:17:08

python xml.dom中的XML字符

问题描述

1 个解决方案

解决方案1 1 已采纳 2009-05-31 23:17:08

解决方案1
1 已采纳 2009-05-31 23:17:08