繁体   English   中英

用于获取指定 xml 标签中的每个文本和标签的 Python 脚本

[英]Python script to Get every texts and tags within a specified xml tag

我必须获取特定标签中的每个标签和值。

例如:

<xml>
<new>
<post>
<text>New Text</text> 
<category>New Category</category>
</post>
</new>
<specific>
<line> Line.... </line> 
New Line ends ......!!!!
</specific>

蟒蛇脚本:

root = et.fromstring('Xml from path')
target_elements = root.findall('.//post')

如果我给标签手段,我需要输出为:

预期输出:

<text>New Text</text>
<category>New Category</category>

对于标签:

输出:

<line> Line.... </line> 
 New Line ends ......!!!!

注意:您的 XML 片段末尾缺少</xml>标记。

content = """\
<xml>
<new>
<post>
<text>New Text</text> 
<category>New Category</category>
</post>
</new>
<specific>
<line> Line.... </line> 
New Line ends ......!!!!
</specific>
</xml>"""

使用lxml没有真正的困难:

from lxml import etree

root = etree.XML(content)

for elem in root.findall(".//post"):
    for child in iter(elem):
        print(child.tag + ": " + child.text)

如果要将 XML 片段输出为字符串,只需使用tostring函数:

for elem in root.findall(".//post"):
    for child in iter(elem):
        print(etree.tostring(child, encoding="unicode", with_tail=False))

你会得到:

<text>New Text</text>
<category>New Category</category>

要更进一步,请阅读在线教程: http : //lxml.de/tutorial.html

我会和 Beautifulsoup 一起去

from bs4 import BeautifulSoup

xml_doc = '''<xml>
<new>
<post>
<text>New Text</text>
<category>New Category</category>
</post>
</new>
<specific>
<line> Line.... </line>
New Line ends ......!!!!
</specific>'''

soup = BeautifulSoup(xml_doc)
print(soup.find_all('post'))

输出:

[<post>
<text>New Text</text>
<category>New Category</category>
</post>]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM