用于获取指定 xml 标签中的每个文本和标签的 Python 脚本

Question

我必须获取特定标签中的每个标签和值。

例如：

<xml>
<new>
<post>
<text>New Text</text> 
<category>New Category</category>
</post>
</new>
<specific>
<line> Line.... </line> 
New Line ends ......!!!!
</specific>

蟒蛇脚本：

root = et.fromstring('Xml from path')
target_elements = root.findall('.//post')

如果我给标签手段，我需要输出为：

预期输出：

<text>New Text</text>
<category>New Category</category>

对于标签：

输出：

<line> Line.... </line> 
 New Line ends ......!!!!

Answer 1

注意：您的 XML 片段末尾缺少</xml>标记。

content = """\
<xml>
<new>
<post>
<text>New Text</text> 
<category>New Category</category>
</post>
</new>
<specific>
<line> Line.... </line> 
New Line ends ......!!!!
</specific>
</xml>"""

使用lxml没有真正的困难：

from lxml import etree

root = etree.XML(content)

for elem in root.findall(".//post"):
    for child in iter(elem):
        print(child.tag + ": " + child.text)

如果要将 XML 片段输出为字符串，只需使用tostring函数：

for elem in root.findall(".//post"):
    for child in iter(elem):
        print(etree.tostring(child, encoding="unicode", with_tail=False))

你会得到：

<text>New Text</text>
<category>New Category</category>

要更进一步，请阅读在线教程： http : //lxml.de/tutorial.html

Answer 2

我会和 Beautifulsoup 一起去

from bs4 import BeautifulSoup

xml_doc = '''<xml>
<new>
<post>
<text>New Text</text>
<category>New Category</category>
</post>
</new>
<specific>
<line> Line.... </line>
New Line ends ......!!!!
</specific>'''

soup = BeautifulSoup(xml_doc)
print(soup.find_all('post'))

输出：

[<post>
<text>New Text</text>
<category>New Category</category>
</post>]

用于获取指定 xml 标签中的每个文本和标签的 Python 脚本

问题描述

2 个解决方案

解决方案1
0 2017-01-08 17:35:04

解决方案2
0 2017-01-08 18:46:54

用于获取指定 xml 标签中的每个文本和标签的 Python 脚本

问题描述

2 个解决方案

解决方案1 0 2017-01-08 17:35:04

解决方案2 0 2017-01-08 18:46:54

解决方案1
0 2017-01-08 17:35:04

解决方案2
0 2017-01-08 18:46:54