![](/img/trans.png)
[英]How do I search for a Tag in xml file using ElementTree where i have a certain “Parent”tag with a specific value? (python)
[英]how to search a specific tag in xml file with python
我有一个非常大且复杂的 xml 文件,我想从中获取一个text_body
。 我需要跳过其他树和树枝,只得到它们看起来像这样的特定部分:
<req id="1">
<text_body>
Upon the USB being plugged in the system shall be able to be deployed and operational in less than 1 minute.
</text_body>
</req>
<req id="2">
<text_body>
The system shall be able to handle 1000 customers logged in concurrently at the same time.
</text_body>
</req>
<req id="CO-1">
<text_body>
Must use a SQL based database. SQL standard is the most widely used database format. Restricting to SQL allows easy of use and compatibility for Web Store.
</text_body>
</req>
<req id="CO-2">
<text_body>
Compatibility is only tested and verified for Microsoft Internet Explorer version 6 and 7, Netscape Communicator Version 4 and 5. Other versions may not be 100% compatible. Also other browsers such as Mozilla or Firefox may not be 100% compatible.
</text_body>
</req>
<req id="3">
<text_body>
The system shall adhere to the following hardware requirements:
<itemize>
<item>4GB Flash ram chip</item>
<item>128MB SDRAM</item>
<item>Intel XScale PXA270 520-MHz chipset</item>
<item>OS: Apache web server</item>
<item>Database: MySQL</item>
</itemize>
</text_body>
</req>
我需要在text_body
中获取字符串,但是如何编写我的代码,例如“返回带有任何 id 的字符串”。 如您所见,有不同的ID。 在最后一个中, text_body
内还有一个我不需要的 itemsize。 有类似的问题,例如Q1和Q2我试图从 therm 获得帮助,但他们没有返回我需要的东西。 我怎样才能做到这一点?
更新我需要一个 output 像这样:
要求1:第一个text_body
要求2:seconf text_body
这是你要找的吗?
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('test.xml').read(), features='lxml')
for text_body in soup.find_all('text_body')[:2]:
print(text_body.get_text().strip())
Output
Upon the USB being plugged in the system shall be able to be deployed and operational in less than 1 minute.
The system shall be able to handle 1000 customers logged in concurrently at the same time.
您可以使用 Python 的内置库来处理xml
文件:
import xml.etree.ElementTree as ET
tree = ET.parse('your/xml_file.xml')
root = tree.getroot()
text_body_strings = [x.find('text_body').text for x in root.findall('req')]
您可能会发现需要对text_body_strings
进行一些文本清理,但这是另一个主题。
可以在此处找到有关此 package 的文档。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.