![](/img/trans.png)
[英]Python-Beautiful Soup How to get tags and texts from a xml file even not knowing all the names of the tags
[英]Python script to Get every texts and tags within a specified xml tag
我必须获取特定标签中的每个标签和值。
例如:
<xml>
<new>
<post>
<text>New Text</text>
<category>New Category</category>
</post>
</new>
<specific>
<line> Line.... </line>
New Line ends ......!!!!
</specific>
蟒蛇脚本:
root = et.fromstring('Xml from path')
target_elements = root.findall('.//post')
如果我给标签手段,我需要输出为:
预期输出:
<text>New Text</text>
<category>New Category</category>
对于标签:
输出:
<line> Line.... </line>
New Line ends ......!!!!
注意:您的 XML 片段末尾缺少</xml>
标记。
content = """\
<xml>
<new>
<post>
<text>New Text</text>
<category>New Category</category>
</post>
</new>
<specific>
<line> Line.... </line>
New Line ends ......!!!!
</specific>
</xml>"""
使用lxml没有真正的困难:
from lxml import etree
root = etree.XML(content)
for elem in root.findall(".//post"):
for child in iter(elem):
print(child.tag + ": " + child.text)
如果要将 XML 片段输出为字符串,只需使用tostring
函数:
for elem in root.findall(".//post"):
for child in iter(elem):
print(etree.tostring(child, encoding="unicode", with_tail=False))
你会得到:
<text>New Text</text>
<category>New Category</category>
要更进一步,请阅读在线教程: http : //lxml.de/tutorial.html
我会和 Beautifulsoup 一起去
from bs4 import BeautifulSoup
xml_doc = '''<xml>
<new>
<post>
<text>New Text</text>
<category>New Category</category>
</post>
</new>
<specific>
<line> Line.... </line>
New Line ends ......!!!!
</specific>'''
soup = BeautifulSoup(xml_doc)
print(soup.find_all('post'))
输出:
[<post>
<text>New Text</text>
<category>New Category</category>
</post>]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.