简体   繁体   English

Python minidom解析器:忽略具有相同属性值的标记

[英]Python minidom parser: Ignore tag with identical attribute values

Trying to parse xml to text I got something like this, 试图将xml解析为文本我得到了这样的东西,

INPUT FILE 输入文件

<Item id="1"></Item>
<Item id="2"></Item>
<Item id="1"></Item>
<Item id="2"></Item>

CURRENT OUTPUT 当前输出

 Item ->1
 Item ->2
 Item ->2
 Item ->1

My DESIRED OUTPUT would be, 我希望的输出是,

Item ->1
Item ->2

(Ignoring repeated id values) (忽略重复的id值)

The current code I'm using to obtain my CURRENT OUTPUT is, 我用来获取CURRENT OUTPUT的当前代码是,

list = node.getElementsByTagName('Item')
for item in list:
  output_id = item.getAttribute('id')
  print "Item ->", output_id

I've tried thousands of list remove methods but they all output the double ids. 我已经尝试了数千个列表删除方法,但它们都输出了双重ID。 Help would be greatly appreciated. 非常感谢帮助。 Tks TKS

At first, every DOM parser will return doubled ids as they're different elements. 首先,每个DOM解析器都会返回doubled,因为它们是不同的元素。 To avoid, go through dom tree and store results in dict object. 要避免,请通过dom树并将结果存储在dict对象中。 This will get you only last items. 这将只为您提供最后的项目。

UPD: UPD:

list = node.getElementsByTagName('Item')
items = {}
for item in list:
  output_id = item.getAttribute('id')
  items[output_id] = item # Put items into dict to use them later.
for id in items:
  print "Item[%d] -> %s" % (id, items[id]) # Only single item per id left.

And more 'pythonic' way: 更多'pythonic'方式:

list = node.getElementsByTagName('Item')
items = dict((item.getAttribute('id'), item) for item in list)
for id in items:
  print "Item[%d] -> %s" % (id, items[id]) # Only single item per id left.

Use a dictionary instead. 请改用字典。 The output_id as the key. output_id为关键字。

If you want to have only the last element with each id: 如果你想只有每个id的最后一个元素:

list = node.getElementsByTagName('Item')
item_dict = {}
for item in list:
  output_id = item.getAttribute('id')
  item_dict [output_id] = item

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM