[英]Python minidom XML parser - ignore child tags
I have a XML file which looks like: 我有一个XML文件,看起来像:
<tag1>
<tag2>
I am too good <italic>to be true</italic>
</tag2>
</tag1>
Now, When I want to extract the data within the "tag2" tags, then assuming the XML file is read into the "XML_data" variable: 现在,当我想提取“ tag2”标签中的数据时,然后假设将XML文件读入“ XML_data”变量中:
XML_data.getElementsByTagName('tag1')[0].getElementsByTagName('tag2')[0].childNodes[0].data
evaluates to "I am too good"
and
XML_data.getElementsByTagName('tag1')[0].getElementsByTagName('tag2')[0].getElementsByTagName('italic')[0].childNodes[0].data
evaluates to "to be true"
What I want is to be able to extract the whole chunk within tag2, by ignoring the italic tags. 我想要的是能够通过忽略斜体标签来提取tag2中的整个块。 ie, I want my out put to be 即,我希望我的能力
"I am too good <italic>to be true</italic>"
How do I do this? 我该怎么做呢? Please help. 请帮忙。
Finally used ElementTree 终于用了ElementTree
import xml.etree.ElementTree as ET
import re
def extractTextFromElement(elementName, stringofxml):
tree = ET.fromstring(stringofxml)
for child in tree.getiterator():
if child.tag == elementName:
len = ET.tostring(child)
return re.sub(r'<.*?>', '', len)
usage: extractTextFromElement('tag2', XML_data)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.