[英]Not recursive (single node level) getElementsByTagName in Python xml.dom
Is there any way to use getElementsByTagName
only at a single node level and not recursively? 有没有办法只在单个节点级别使用
getElementsByTagName
而不是递归?
Eg consider parsing a pom.xml
file: 例如,考虑解析
pom.xml
文件:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<parent>
<groupId>com.parent</groupId>
<artifactId>parent</artifactId>
<version>1.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>
<modelVersion>2.0.0</modelVersion>
<groupId>com.parent.somemodule</groupId>
<artifactId>some_module</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>Some Module</name>
...
If I want to get groupId
at the top level (specifically project->groupId
, not project->parent->groupId
), I use: 如果我想将
groupId
放在顶层(特别是project->groupId
,而不是project->parent->groupId
),我使用:
xmldoc = minidom.parse('pom.xml')
groupId = xmldoc.getElementsByTagName("groupId")[0].childNodes[0].nodeValue
But unfortunately, that finds the first physical occurrence of groupId
in the file regardless of the hierarchy level, which is project->parent->groupId
. 但不幸的是,无论层次结构级别如何,它都会在文件中找到
groupId
的第一次物理出现,即project->parent->groupId
。 I actually want to do a unrecursive find ONLY at a specific node level, not within its children. 我实际上只想在特定节点级别进行非递归查找,而不是在其子级内。 Is there a way to do it in
xml.dom
? 有没有办法在
xml.dom
做到这一点?
UPDATE: I switched to BeautifulSoup
but still having the same problem with implicit recursive traversing: Finding a nonrecursive DOM subnode in Python using BeautifulSoup 更新:我切换到
BeautifulSoup
但仍然有隐式递归遍历的相同问题: 使用BeautifulSoup在Python中查找非递归DOM子节点
You can iterate over getElementsByTagName()
results and take the first element that is in on the root level: 您可以迭代
getElementsByTagName()
结果并获取根级别上的第一个元素:
group_id_element = next(element for element in xmldoc.getElementsByTagName("groupId")
if element.parentNode == xmldoc.documentElement)
print group_id_element.childNodes[0].nodeValue
Note that it would be easier, shorter and faster to do the same with ElementTree , which is also a part of standard library. 请注意,使用ElementTree执行相同操作会更容易,更短,更快, ElementTree也是标准库的一部分。
Hope that helps. 希望有所帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.