I have a xml file in which elements are present in some random order. I have to compare these files but due to the change in order of elements, it requires manual effort.
I am looking for some way to sort these files. Can someone please give me some pointers/approach to this problem. I tried reading the documentation of lxml (ElementTree and Element classes), but there doesn't seems to be a method by which I can sort the children elements based on xml text.
I can sort the elements based on Name, but within an attribute element, how can the legal element childs can be sorted?
Input :-
<root>
<attribute Name="attr2">
<v>
<cstat>
<s>nObjDef2</s>
<s>nObjDef1</s>
</cstat>
</v>
<objects>
<legal>
<o>otype2</o>
<o>otype1</o>
</legal>
</objects>
</attribute>
<attribute Name="attr1">
<v>
<cstat>
<s>nObjDef2</s>
<s>nObjDef1</s>
</cstat>
</v>
<objects>
<legal>
<o>otype2</o>
<o>otype1</o>
</legal>
</objects>
</attribute>
</root>
Expected Output :
<root>
<attribute Name="attr1">
<v>
<cstat>
<s>nObjDef1</s>
<s>nObjDef2</s>
</cstat>
</v>
<objects>
<legal>
<o>otype1</o>
<o>otype2</o>
</legal>
</objects>
</attribute>
<attribute Name="attr2">
<v>
<cstat>
<s>nObjDef1</s>
<s>nObjDef2</s>
</cstat>
</v>
<objects>
<legal>
<o>otype1</o>
<o>otype2</o>
</legal>
</objects>
</attribute>
</root>
If you want to sort the children by the text, just find the legal nodes and sort the children using child.text as the key:
x = """<root>
<attribute Name="attr2">
<v>
<cstat>
<s>nObjDef2</s>
<s>nObjDef1</s>
</cstat>
</v>
<objects>
<legal>
<o>otype2</o>
<o>otype1</o>
</legal>
</objects>
</attribute>
<attribute Name="attr1">
<v>
<cstat>
<s>nObjDef2</s>
<s>nObjDef1</s>
</cstat>
</v>
<objects>
<legal>
<o>otype2</o>
<o>otype1</o>
</legal>
</objects>
</attribute>
</root>
"""
The to sort each node:
from lxml import etree
xml = etree.fromstring(x)
for node in xml.xpath("//legal"):
node[:] = sorted(node, key=lambda ch: ch.text)
That will reorder the children:
print(etree.tostring(xml, pretty_print=1).decode("utf-8"))
Giving you:
<root>
<attribute Name="attr2">
<v>
<cstat>
<s>nObjDef2</s>
<s>nObjDef1</s>
</cstat>
</v>
<objects>
<legal>
<o>otype1</o>
<o>otype2</o>
</legal>
</objects>
</attribute>
<attribute Name="attr1">
<v>
<cstat>
<s>nObjDef2</s>
<s>nObjDef1</s>
</cstat>
</v>
<objects>
<legal>
<o>otype1</o>
<o>otype2</o>
</legal>
</objects>
</attribute>
</root>
Or a more efficient approach, use operator.attrgetter in place of the lambda:
from lxml import etree
from operator import attrgetter
xml = etree.fromstring(x)
for node in xml.xpath("//legal"):
node[:] = sorted(node, key=attrgetter("text"))
Consider XSLT , the special purpose language designed specifically to manipulate and transform XML files. Python's lxml can run XSLT 1.0 scripts. Specifically, XSLT maintains the <xsl:sort>
method which can be run inside templates:
import lxml.etree as et
# LOAD XML (FROM FILE) AND XSL (FROM STRING)
xml = et.parse('Input.xml')
xslstr = '''<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<!-- Identity Transform -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- Sort Children Text of Nodes -->
<xsl:template match="cstat|legal">
<xsl:copy>
<xsl:apply-templates select="*">
<xsl:sort select="." order="ascending" data-type="text"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
</xsl:transform>'''
xslt = et.fromstring(xslstr)
# TRANSFORM SOURCE TO NEW TREE
transform = et.XSLT(xslt)
newdom = transform(xml)
print(newdom)
# OUTPUT TO FILE
tree_out = et.tostring(newdom, encoding='UTF-8', pretty_print=True, xml_declaration=True)
xmlfile = open('Output.xml','wb')
xmlfile.write(tree_out)
xmlfile.close()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.