简体   繁体   English

使用xml.dom.minidom时在xml中转义'<'和'>'

[英]Escaping '<' and '>' in xml when using xml.dom.minidom

I am stuck while escaping "<" and ">" in the xml file using xml.dom.minidom. 使用xml.dom.minidom在xml文件中转义“ <”和“>”时,我陷入了困境。 I tried to get the unicode hex value and use that instead 我试图获取unicode十六进制值,并改用它
http://slayeroffice.com/tools/unicode_lookup/ http://slayeroffice.com/tools/unicode_lookup/

Tried to use the standard "<" and ">" but still with no success. 尝试使用标准的“ <”和“>”,但仍然没有成功。

from xml.dom.minidom import Document
doc = Document()
e = doc.createElement("abc")
s1 = '<hello>bhaskar</hello>'
text = doc.createTextNode(s1)
e.appendChild(text)

e.toxml()
'<abc>&lt;hello&gt;bhaskar&lt;/hello&gt;</abc>'

same result with writexml() Also tried by specifying encoding 'UTF-8', 'utf-8', 'utf' in the toxml() writexml() calls but with same results. 与writexml()的结果相同也可以通过在toxml()writexml()调用中指定编码'UTF-8','utf-8','utf'来尝试,但结果相同。

from xml.dom.minidom import Document
doc = Document()
e = doc.createElement("abc")
s1 = u'&lt;hello&gt;bhaskar&lt;/hello&gt;'
text = doc.createTextNode(s1)
e.appendChild(text)

e.toxml()
u'<abc>&amp;lt;hello&amp;gt;bhaskar&amp;lt;/hello&amp;gt;</abc>'

Tried other ways but with same results. 尝试了其他方法,但结果相同。 Only way i could work-around is by overriding the writer 我可以解决的唯一方法是重写作者

import xml.dom.minidom as md
# XXX Hack to handle '<' and '>'
def wd(writer, data):
    data = data.replace("&lt;", "<").replace("&gt;", ">")
    writer.write(data)

md._write_data = wd

Edit - This is the code . 编辑-这是代码

    import xml.dom.minidom as md
    doc = md.Document()

    entity_descr = doc.createElement("EntityDescriptor")
    doc.appendChild(entity_descr)
    entity_descr.setAttribute('xmlns', 'urn:oasis:names:tc:SAML:2.0:metadata')
    entity_descr.setAttribute('xmlns:saml', 'urn:oasis:names:tc:SAML:2.0:assertion')
    entity_descr.setAttribute('xmlns:ds', 'http://www.w3.org/2000/09/xmldsig#')
    # Get the entity_id from saml20_idp_settings
    entity_descr.setAttribute('entityID', self.group['entity_id'])

    idpssodescr = doc.createElement('IDPSSODescriptor')
    idpssodescr.setAttribute('WantAuthnRequestsSigned', 'true')
    idpssodescr.setAttribute('protocolSupportEnumeration', 
    'urn:oasis:names:tc:SAML:2.0:protocol')
    entity_descr.appendChild(idpssodescr)

    keydescr = doc.createElement('KeyDescriptor')
    keydescr.setAttribute('use', 'signing')
    idpssodescr.appendChild(keydescr)
    keyinfo = doc.createElement('ds:KeyInfo')
    keyinfo.setAttribute('xmlns:ds', 'http://www.w3.org/2000/09/xmldsig#')
    keydescr.appendChild(keyinfo)
    x509data = doc.createElement('ds:X509Data')
    keyinfo.appendChild(x509data)


    # check this part 

    s = "this is a cert  blah blah"
    x509cert = doc.createElement('ds:X509Certificate')
    cert = doc.createTextNode(s)
    x509cert.appendChild(cert)
    x509data.appendChild(x509cert)

    sso = doc.createElement('SingleSignOnService')
    sso.setAttribute('Binding', 'urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect')

    sso.setAttribute('Location', 'http://googleapps/singleSignOn')
    idpssodescr.appendChild(sso)

    # Write the metadata file.
    fobj = open('metadata.xml', 'w')
    doc.writexml(fobj, "   ", "", "\n", "UTF-8")
    fobj.close()

This produces 这产生

   <?xml version="1.0" encoding="UTF-8"?>
   <EntityDescriptor entityID="skar" xmlns="urn:oasis:names:tc:SAML:2.0:metadata"    
   xmlns:ds="http://www.w3.org/2000/09/xmldsig#" 
   xmlns:saml="urn:oasis:names:tc:SAML:2.0:assertion">
   <IDPSSODescriptor WantAuthnRequestsSigned="true"   
   protocolSupportEnumeration="urn:oasis:names:tc:SAML:2.0:protocol">
   <KeyDescriptor use="signing">
   <ds:KeyInfo xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
   <ds:X509Data>
   <ds:X509Certificate>
    this is a cert  blah blah
   </ds:X509Certificate>
   </ds:X509Data>
   </ds:KeyInfo>
   </KeyDescriptor>
   <SingleSignOnService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect" 
   Location="http:///singleSignOn"/>
   </IDPSSODescriptor>
   </EntityDescriptor>

Note the "This is a cert" comes seperately Have broken my head over this but with the same results. 请注意,“这是证书”是单独发出的,虽然我为此感到头疼,但结果却相同。

This is not a bug, it is a feature. 这不是错误,而是功能。 To insert actual XML, insert DOM objects instead. 要插入实际的XML,请插入DOM对象。 Text inside an XML tag needs to be entity escaped though to be valid XML. 尽管XML标记内的文本是有效的XML,但需要对其进行转义。

from xml.dom.minidom import Document
doc = Document()
e = doc.createElement("abc")
eh = doc.createElement("hello")
s1 = 'bhaskar'
text = doc.createTextNode(s1)

eh.appendChild(text)
e.appendChild(eh)

e.toxml()

EDIT: I don't know what Python's API is like, but it looks very similar to C#'s, so you might be able to do something like e.innerXml = s1 to do what you're trying to do... but that could be bad. 编辑:我不知道Python的API是什么样的,但是它看起来与C#的非常相似,因此您可以执行e.innerXml = s1来完成您想做的事情...但是那可以坏。 The better thing to do is parse it and appendChild it as well. 更好的方法是同时解析它和appendChild

EDIT 2: I just ran this via Python locally, and there's definitely something wrong on your end, not in the libraries. 编辑2:我只是通过Python在本地运行,肯定有问题,而不是在库中。 Make sure that your string doesn't have any newlines or whitespace at the start of it. 确保您的字符串开头没有任何换行符或空格。 For reference, the test code I used was: 作为参考,我使用的测试代码为:

Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) 
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from xml.dom.minidom import Document
>>> cert = "---- START CERTIFICATE ----\n   Hello world\n---- END CERTIFICATE ---"
>>> doc = Document()
>>> e = doc.createElement("cert")
>>> certEl = doc.createTextNode(cert)
>>> e.appendChild(certEl)
<DOM Text node "'---- START'...">
>>> print e.toxml()
<cert>---- START CERTIFICATE ----
   Hello world
---- END CERTIFICATE ---</cert>
>>> 

EDIT 3: The final edit. 编辑3:最后编辑。 The problem is in your writexml call. 问题出在您的writexml调用中。 Simply using the following fixes this: 只需使用以下修复此问题:

doc.writexml(fobj)
# or
doc.writexml(fobj, "", "  ", "")

Unfortuanately, it seems that you won't be able to use the newline parameter to get pretty printing though... it seems that the Python library (or atleast minidom ) is written rather poorly and will modify TextNode's while printing them. 不幸的是,尽管您似乎无法使用newline参数进行漂亮的打印...似乎Python库(或至少minidom )编写得很差,并且会在打印它们时修改TextNode。 Not so much a poor implementation as a naive one. 与其说是天真的,还不如说是一个糟糕的实现。 A shame really... 真可惜...

If you use "<" as text in XML, you need to escape it, else it is considered markup. 如果在XML中使用"<" 作为文本 ,则需要对其进行转义,否则将其视为标记。 So xml.dom is right in escaping it, since you've asked for a text node. 因此xml.dom可以将其转义是正确的,因为您已经请求了文本节点。

Assuming you really want to insert a piece of XML, I recommend to use createElement("hello") . 假设您确实要插入XML,我建议使用createElement("hello") If you have a fragment of XML that you don't know the structure of, you should first parse it, and then move the nodes of that parse result into the other tree. 如果您有不知道其结构的XML片段,则应首先对其进行解析,然后将解析结果的节点移至另一棵树中。

If you want to hack, you can inherit from xml.dom.minidom.Text, and overwrite the writexml method. 如果要入侵,可以继承xml.dom.minidom.Text,并覆盖writexml方法。 See the source of minidom for details. 有关详细信息,请参见简约源。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM