My Python
script utilizing BeautifulSoup
gets None
when attempting to parse (find an element from) XML
from a locally loaded file:
xmlData = None
with open('conf//test2.xml', 'r') as xmlFile:
xmlData = xmlFile.read()
# this creates a soup object out of xmlData,
# which is properly loaded from file above
xmlSoup = BeautifulSoup(xmlData, "html.parser")
# this resolves to None
subElemX = xmlSoup.root.singleelement.find('subElementX', recursive=False)
The file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root>
<singleElement>
<subElementX>XYZ</subElementX>
</singleElement>
<repeatingElement id="1"/>
<repeatingElement id="2"/>
</root>
I also have a REST GET service that returns the same XML but when I read that using requests.get
, it is parsed fine:
resp = requests.get(serviceURL, headers=headers)
respXML = resp.content.decode("utf-8")
restSoup = BeautifulSoup(respXML, "html.parser")
Why does it work with the REST response and not with the data read out of a local file?
UPDATE: While I understand that python is case sensitive and single e lement !=single E lement, the case is disregarded when parsing the web service.
Two things to make it work:
html.parser
to xml
(you are parsing XML data, XML != HTML) singleelement
to singleElement
Changes applied (works for me):
xmlSoup = BeautifulSoup(xmlData, "xml")
subElemX = xmlSoup.root.singleElement.find('subElementX', recursive=False)
print(subElemX) # prints <subElementX>XYZ</subElementX>
Apparently, HTML is a case-insensitive language, so html.parser
internally converts all tag names to lower case. Given that, the following line should work:
subElemX = xmlSoup.root.singleelement.find('subelementx', recursive=False)
But in general, you shouldn't parse XML documents with HTML parser. XML is quite strict about its syntax, and that's for a good reason.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.