简体   繁体   中英

Escaping '<' and '>' in xml when using xml.dom.minidom

I am stuck while escaping "<" and ">" in the xml file using xml.dom.minidom. I tried to get the unicode hex value and use that instead
http://slayeroffice.com/tools/unicode_lookup/

Tried to use the standard "<" and ">" but still with no success.

from xml.dom.minidom import Document
doc = Document()
e = doc.createElement("abc")
s1 = '<hello>bhaskar</hello>'
text = doc.createTextNode(s1)
e.appendChild(text)

e.toxml()
'<abc>&lt;hello&gt;bhaskar&lt;/hello&gt;</abc>'

same result with writexml() Also tried by specifying encoding 'UTF-8', 'utf-8', 'utf' in the toxml() writexml() calls but with same results.

from xml.dom.minidom import Document
doc = Document()
e = doc.createElement("abc")
s1 = u'&lt;hello&gt;bhaskar&lt;/hello&gt;'
text = doc.createTextNode(s1)
e.appendChild(text)

e.toxml()
u'<abc>&amp;lt;hello&amp;gt;bhaskar&amp;lt;/hello&amp;gt;</abc>'

Tried other ways but with same results. Only way i could work-around is by overriding the writer

import xml.dom.minidom as md
# XXX Hack to handle '<' and '>'
def wd(writer, data):
    data = data.replace("&lt;", "<").replace("&gt;", ">")
    writer.write(data)

md._write_data = wd

Edit - This is the code .

    import xml.dom.minidom as md
    doc = md.Document()

    entity_descr = doc.createElement("EntityDescriptor")
    doc.appendChild(entity_descr)
    entity_descr.setAttribute('xmlns', 'urn:oasis:names:tc:SAML:2.0:metadata')
    entity_descr.setAttribute('xmlns:saml', 'urn:oasis:names:tc:SAML:2.0:assertion')
    entity_descr.setAttribute('xmlns:ds', 'http://www.w3.org/2000/09/xmldsig#')
    # Get the entity_id from saml20_idp_settings
    entity_descr.setAttribute('entityID', self.group['entity_id'])

    idpssodescr = doc.createElement('IDPSSODescriptor')
    idpssodescr.setAttribute('WantAuthnRequestsSigned', 'true')
    idpssodescr.setAttribute('protocolSupportEnumeration', 
    'urn:oasis:names:tc:SAML:2.0:protocol')
    entity_descr.appendChild(idpssodescr)

    keydescr = doc.createElement('KeyDescriptor')
    keydescr.setAttribute('use', 'signing')
    idpssodescr.appendChild(keydescr)
    keyinfo = doc.createElement('ds:KeyInfo')
    keyinfo.setAttribute('xmlns:ds', 'http://www.w3.org/2000/09/xmldsig#')
    keydescr.appendChild(keyinfo)
    x509data = doc.createElement('ds:X509Data')
    keyinfo.appendChild(x509data)


    # check this part 

    s = "this is a cert  blah blah"
    x509cert = doc.createElement('ds:X509Certificate')
    cert = doc.createTextNode(s)
    x509cert.appendChild(cert)
    x509data.appendChild(x509cert)

    sso = doc.createElement('SingleSignOnService')
    sso.setAttribute('Binding', 'urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect')

    sso.setAttribute('Location', 'http://googleapps/singleSignOn')
    idpssodescr.appendChild(sso)

    # Write the metadata file.
    fobj = open('metadata.xml', 'w')
    doc.writexml(fobj, "   ", "", "\n", "UTF-8")
    fobj.close()

This produces

   <?xml version="1.0" encoding="UTF-8"?>
   <EntityDescriptor entityID="skar" xmlns="urn:oasis:names:tc:SAML:2.0:metadata"    
   xmlns:ds="http://www.w3.org/2000/09/xmldsig#" 
   xmlns:saml="urn:oasis:names:tc:SAML:2.0:assertion">
   <IDPSSODescriptor WantAuthnRequestsSigned="true"   
   protocolSupportEnumeration="urn:oasis:names:tc:SAML:2.0:protocol">
   <KeyDescriptor use="signing">
   <ds:KeyInfo xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
   <ds:X509Data>
   <ds:X509Certificate>
    this is a cert  blah blah
   </ds:X509Certificate>
   </ds:X509Data>
   </ds:KeyInfo>
   </KeyDescriptor>
   <SingleSignOnService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect" 
   Location="http:///singleSignOn"/>
   </IDPSSODescriptor>
   </EntityDescriptor>

Note the "This is a cert" comes seperately Have broken my head over this but with the same results.

This is not a bug, it is a feature. To insert actual XML, insert DOM objects instead. Text inside an XML tag needs to be entity escaped though to be valid XML.

from xml.dom.minidom import Document
doc = Document()
e = doc.createElement("abc")
eh = doc.createElement("hello")
s1 = 'bhaskar'
text = doc.createTextNode(s1)

eh.appendChild(text)
e.appendChild(eh)

e.toxml()

EDIT: I don't know what Python's API is like, but it looks very similar to C#'s, so you might be able to do something like e.innerXml = s1 to do what you're trying to do... but that could be bad. The better thing to do is parse it and appendChild it as well.

EDIT 2: I just ran this via Python locally, and there's definitely something wrong on your end, not in the libraries. Make sure that your string doesn't have any newlines or whitespace at the start of it. For reference, the test code I used was:

Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) 
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from xml.dom.minidom import Document
>>> cert = "---- START CERTIFICATE ----\n   Hello world\n---- END CERTIFICATE ---"
>>> doc = Document()
>>> e = doc.createElement("cert")
>>> certEl = doc.createTextNode(cert)
>>> e.appendChild(certEl)
<DOM Text node "'---- START'...">
>>> print e.toxml()
<cert>---- START CERTIFICATE ----
   Hello world
---- END CERTIFICATE ---</cert>
>>> 

EDIT 3: The final edit. The problem is in your writexml call. Simply using the following fixes this:

doc.writexml(fobj)
# or
doc.writexml(fobj, "", "  ", "")

Unfortuanately, it seems that you won't be able to use the newline parameter to get pretty printing though... it seems that the Python library (or atleast minidom ) is written rather poorly and will modify TextNode's while printing them. Not so much a poor implementation as a naive one. A shame really...

If you use "<" as text in XML, you need to escape it, else it is considered markup. So xml.dom is right in escaping it, since you've asked for a text node.

Assuming you really want to insert a piece of XML, I recommend to use createElement("hello") . If you have a fragment of XML that you don't know the structure of, you should first parse it, and then move the nodes of that parse result into the other tree.

If you want to hack, you can inherit from xml.dom.minidom.Text, and overwrite the writexml method. See the source of minidom for details.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM