如何使用 python 在 xml 文件中搜索特定标签

Question

我有一个非常大且复杂的 xml 文件，我想从中获取一个text_body 。 我需要跳过其他树和树枝，只得到它们看起来像这样的特定部分：

<req id="1">
    <text_body>
        Upon the USB being plugged in the system shall be able to be deployed and operational in less than 1 minute.
    </text_body>
</req>
<req id="2">
    <text_body>
    The system shall be able to handle 1000 customers logged in concurrently at the same time.
    </text_body>
</req>
<req id="CO-1">
    <text_body>
        Must use a SQL based database. SQL standard is the most widely used database format. Restricting to SQL allows easy of use and compatibility for Web Store.
    </text_body>
</req>
<req id="CO-2">
    <text_body>
        Compatibility is only tested and verified for Microsoft Internet Explorer version 6 and 7, Netscape Communicator Version 4 and 5. Other versions may not be 100&#37; compatible. Also other browsers such as Mozilla or Firefox may not be 100&#37; compatible.
    </text_body>
</req>
<req id="3">
    <text_body>
The system shall adhere to the following hardware requirements:
    <itemize>
        <item>4GB Flash ram chip</item>
        <item>128MB SDRAM</item>
        <item>Intel XScale PXA270 520-MHz chipset</item>
        <item>OS: Apache web server</item>
        <item>Database: MySQL</item>
    </itemize>
    </text_body>
</req>

我需要在text_body中获取字符串，但是如何编写我的代码，例如“返回带有任何 id 的字符串”。 如您所见，有不同的ID。 在最后一个中， text_body内还有一个我不需要的 itemsize。 有类似的问题，例如Q1和Q2我试图从 therm 获得帮助，但他们没有返回我需要的东西。 我怎样才能做到这一点？

更新我需要一个 output 像这样：
要求1：第一个text_body
要求2：seconf text_body

Answer 1

这是你要找的吗？

from bs4 import BeautifulSoup

soup = BeautifulSoup(open('test.xml').read(), features='lxml')
for text_body in soup.find_all('text_body')[:2]:
    print(text_body.get_text().strip())

Output

Upon the USB being plugged in the system shall be able to be deployed and operational in less than 1 minute.
The system shall be able to handle 1000 customers logged in concurrently at the same time.

Answer 2

您可以使用 Python 的内置库来处理xml文件：

import xml.etree.ElementTree as ET 

tree = ET.parse('your/xml_file.xml')
root = tree.getroot()
text_body_strings = [x.find('text_body').text for x in root.findall('req')]

您可能会发现需要对text_body_strings进行一些文本清理，但这是另一个主题。

可以在此处找到有关此 package 的文档。

如何使用 python 在 xml 文件中搜索特定标签

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-06-15 07:06:29

解决方案2
0 2020-06-15 07:20:32

如何使用 python 在 xml 文件中搜索特定标签

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-06-15 07:06:29

解决方案2 0 2020-06-15 07:20:32

解决方案1
1 已采纳 2020-06-15 07:06:29

解决方案2
0 2020-06-15 07:20:32