简体   繁体   English


[英]Unescaping XML attributes in Python

How to escape a string so that it can be used in an XML attribute? 如何转义字符串以便可以在XML属性中使用?

I am not setting the attribute programmatically from python code — I just need to create a valid string which can be used as an XML attribute. 不是从python代码中以编程方式设置属性-我只需要创建一个可用作XML属性的有效字符串即可。

I have tried: 我努力了:

from xml.sax.saxutils import escape, quoteattr

print (escape('<&% "eggs and spam" &>'))
# >>> &lt;&amp;% "eggs and spam" &amp;&gt;

print (quoteattr('<&% "eggs and spam" &>'))
# >>> '&lt;&amp;% "eggs and spam" &amp;&gt;'

The problem is that both escape() and quoteattr() are not escaping the double quote character, ie " . 问题在于escape()quoteattr()都没有转义双引号字符,即"

Of course, I can do a .replace('"', '&quot;') on the escaped string, but I am assuming there should be a way to do it with the existing API (from the standard library or with third-party modules such as lxml ). 当然,我可以对转义的字符串执行.replace('"', '&quot;') ,但我假设应该有一种方法可以使用现有的API(来自标准库或第三方)模块,例如lxml )。

Update : I've found that Python3's html.escape produces the expected result but I am reluctant to use it in an XML context since I'm assuming that HTML escaping might follow a different spec than what is required by the XML standard ( https://www.w3.org/TR/xml/#AVNormalize ). 更新 :我发现Python3的html.escape产生了预期的结果,但是我不愿意在XML上下文中使用它,因为我假设HTML转义可能遵循与XML标准要求不同的规范( https: //www.w3.org/TR/xml/#AVNormalize )。

Shamelessly stolen from tornado (with a few modifications): tornado偷偷偷走(经过一些修改):

import re
_XHTML_ESCAPE_RE = re.compile('[&<>"\']')
_XHTML_ESCAPE_DICT = {'&': '&amp;', '<': '&lt;', '>': '&gt;', '"': '&quot;',
                      '\'': '&#39;'}

def xhtml_escape(value):
    """Escapes a string so it is valid within HTML or XML.

    Escapes the characters ``<``, ``>``, ``"``, ``'``, and ``&``.
    When used in attribute values the escaped strings must be enclosed
    in quotes.

    .. versionchanged:: 3.2

       Added the single quote to the list of escaped characters.
    return _XHTML_ESCAPE_RE.sub(lambda match: _XHTML_ESCAPE_DICT[match.group(0)],

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM