简体   繁体   English

使用标记按python排序xml

[英]Sort xml with python by tag

I have an xml 我有一个XML

<root>
 <node1>
  <B>text</B>
  <A>another_text</A>
  <C>one_more_text</C>
 </node1>
 <node2>
  <C>one_more_text</C>
  <B>text</B>
  <A>another_text</A>
 </node2>
</root>

I want get output like: 我想得到像这样的输出:

<root>
 <node1>
  <A>another_text</A>
  <B>text</B>
  <C>one_more_text</C>
 </node1>
 <node2>
  <A>another_text</A>
  <B>text</B>
  <C>one_more_text</C>
 </node2>
</root>

I tried with some code like: 我尝试了一些类似的代码:

from xml.etree import ElementTree as et

tr = et.parse(path_in)
root = tr.getroot()
for children in root.getchildren():
    for child in children.getchildren():
        # sort it

tr.write(path_out)        

I cannot use standard function sort and sorted because it sorted wrong way (not by tag). 我无法使用标准函数进行sortsorted因为它以错误的方式排序(而不是通过标签)。 Thanks in advance. 提前致谢。

From a similar question : 从类似的问题:

from lxml import etree

data = """<X>
    <X03>3</X03>
    <X02>2</X02>
    <A>
        <A02>Y</A02>
        <A01>X</A01>
        <A03>Z</A03>
    </A>
    <X01>1</X01>
    <B>
        <B01>Z</B01>
        <B02>X</B02>
        <B03>C</B03>
    </B>
</X>"""

doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))

for parent in doc.xpath('//*[./*]'): # Search for parent elements
  parent[:] = sorted(parent,key=lambda x: x.tag)

print etree.tostring(doc,pretty_print=True)

result : 结果:

<X>
  <A>
    <A01>X</A01>
    <A02>Y</A02>
    <A03>Z</A03>
  </A>
  <B>
    <B01>Z</B01>
    <B02>X</B02>
    <B03>C</B03>
  </B>
  <X01>1</X01>
  <X02>2</X02>
  <X03>3</X03>
</X>

You can find more information here : http://effbot.org/zone/element-sort.htm 您可以在这里找到更多信息: http : //effbot.org/zone/element-sort.htm

You need to: 你需要:

  • get the children elements for every top-level "node" 获取每个顶级“节点”的子元素
  • sort them by the tag attribute (node's name) tag属性 (节点名称)对它们进行排序
  • reset the child nodes of each top-level node 重置每个顶级节点的子节点

Sample working code: 示例工作代码:

from operator import attrgetter
from xml.etree import ElementTree as et

data = """  <root>
 <node1>
  <B>text</B>
  <A>another_text</A>
  <C>one_more_text</C>
 </node1>
 <node2>
  <C>one_more_text</C>
  <B>text</B>
  <A>another_text</A>
 </node2>
</root>"""


root = et.fromstring(data)
for node in root.findall("*"):  # searching top-level nodes only: node1, node2 ...
    node[:] = sorted(node, key=attrgetter("tag"))

print(et.tostring(root))

Prints: 打印:

<root>
 <node1>
  <A>another_text</A>
  <B>text</B>
  <C>one_more_text</C>
 </node1>
 <node2>
  <A>another_text</A>
  <B>text</B>
  <C>one_more_text</C>
  </node2>
</root>

Note that we are not using getchildren() method here (it is actually deprecated since Python 2.7) - using the fact that each Element instance is an iterable over the child nodes. 请注意,这里我们没有使用getchildren()方法 (从python 2.7开始实际上已弃用 )-使用每个Element实例在子节点上都是可迭代的事实。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM