I am working on a SOAP api with python-suds.
Api returns result and suds parse it according to WSDL. result data have an XML data field
(MyServiceResult){
errorMsg = "Error Message here..."
sessionId = "..."
outputDataXML = "<![CDATA[<Results>.....<Details>....</Details></Results>]]>"
errorCode = "00"
}
So I planned to use xml.etree.ElementTree
to parse the xml data part outputDataXML
. But since returning data starts with <![CDATA[
, xml parser fails with
ParseError: syntax error: line 1, column 0
What is the best approach for a such situation except usge of regex?
Call ET.fromstring
once to extract the text from the CDATA. Call ET.fromstring
a second time to parse the string as XML:
import xml.etree.ElementTree as ET
d = '<![CDATA[<Results>.....<Details>....</Details></Results>]]>'
fix = '<root>{}</root>'.format(d)
content = ET.fromstring(fix).text
print(repr(content))
# '<Results>.....<Details>....</Details></Results>'
results = ET.fromstring(content)
print(ET.tostring(results))
# <Results>.....<Details>....</Details></Results>
When reading all kind of weird formatted XML-like data, you can always use BeautifulSoup :
>>> from bs4 import BeautifulSoup
>>> d="<![CDATA[<Results>.....<Details>....</Details></Results>]]>"
>>> soup=BeautifulSoup(d)
>>> from xml.etree import ElementTree
>>> tree=ElementTree.fromstring(str(soup))
Otherwise, you can make a quick hack like this:
tree = ElementTree.fromstring(outputDataXML.replace("<![CDATA[", "").replace("]]>", ""))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.