[英]Python minidom extract text from XML
Python beginner here. Python初学者在这里。 I am trying to parse the structure of an XML file, using minidom.
我正在尝试使用minidom解析XML文件的结构。 The XML structure is like this:
XML结构如下:
...
<Node Precode="1">
<Text Id="9">sometext 1</Text>
</Node>
...
I am trying to add all node elements into a list, using a recursive function (not of my own design, found on stackoverflow and adapted to my needs). 我正在尝试使用递归函数(不是我自己设计的,在stackoverflow上找到并适应我的需求)将所有节点元素添加到列表中。 The current status is this:
当前状态是这样的:
from xml.dom import minidom
list_to_write=[]
def parse_node(root):
if root.childNodes:
for node in root.childNodes:
if node.nodeType == node.ELEMENT_NODE:
new_node = [node.tagName,node.parentNode.tagName,node.getAttribute('Precode'),node.attributes.items()]
list_to_write.append(new_node)
parse_node(node)
return list_to_write
How can I extract the "sometext" text and add it as an element in the list_to_write
list? 如何提取“ sometext”文本并将其添加为
list_to_write
列表中的元素?
I assume you have a nodes.xml: 我假设您有一个nodes.xml:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<Node >
<Text Id="9">sometext 1</Text>
</Node>
<Node >
<Text Id="9">sometext 2</Text>
</Node>
<Node >
<Text Id="9">sometext 3</Text>
</Node>
<Node >
<Text Id="9">sometext 4</Text>
</Node>
<Node >
<Text Id="9">sometext 5</Text>
</Node>
<Node>
<Text Id="9">sometext 6</Text>
</Node>
<Node >
<Text Id="9">sometext 7</Text>
</Node>
</root>
And you can take the bellow code to get the texts : 您可以使用下面的代码获取文本:
from xml.dom import minidom
list_to_write=[]
def parse_node():
doc = minidom.parse("nodes.xml")
root = doc.documentElement
nodes = root.getElementsByTagName("Node")
print doc
for node in nodes:
list_to_write.append(node.getElementsByTagName("Text")[0].childNodes[0].nodeValue)
parse_node()
print (list_to_write)
The result is: 结果是:
[u'sometext 1', u'sometext 2', u'sometext 3', u'sometext 4', u'sometext 5', u'sometext 6', u'sometext 7']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.