I have an XML like the input below.
Here I want to aggregate Sales
values of the Diesel
fuel type.
How can I iterate all <Tank>
elements and read the fuelItem
attribute to find more than one occurrences of same fuel type, and then sum up the Sales
attribute values?
Input:
<EnterpriseDocument>
<FuelTankList>
<Tank fuelItem="Petrol" Sales="1000" />
<Tank fuelItem="Diesel" Sales="2000" />
<Tank fuelItem="Diesel" Sales="3000" />
</FuelTankList>
</EnterpriseDocument>
Preferred output:
<EnterpriseDocument>
<FuelTankList>
<Tank fuelItem="Petrol" netSalesQty="1000" />
<Tank fuelItem="Diesel" netSalesQty="5000" />
</FuelTankList>
</EnterpriseDocument>
Since you're using lxml, you could use XSLT and Muenchian Grouping to group Tank
elements by their fuelItem
attributes.
Example...
XML Input (input.xml)
<EnterpriseDocument>
<FuelTankList>
<Tank fuelItem="Petrol" Sales="1000" />
<Tank fuelItem="Diesel" Sales="2000" />
<Tank fuelItem="Diesel" Sales="3000" />
</FuelTankList>
</EnterpriseDocument>
XSLT 1.0 (test.xsl)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="tanks" match="Tank" use="@fuelItem"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="FuelTankList">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:for-each select="Tank[count(.|key('tanks',@fuelItem)[1])=1]">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:attribute name="Sales">
<xsl:value-of select="sum(key('tanks',@fuelItem)/@Sales)"/>
</xsl:attribute>
</xsl:copy>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Python
from lxml import etree
tree = etree.parse("input.xml")
xslt = etree.parse("test.xsl")
new_tree = tree.xslt(xslt)
print(etree.tostring(new_tree, pretty_print=True).decode("utf-8"))
Output (stdout)
<EnterpriseDocument>
<FuelTankList>
<Tank fuelItem="Petrol" Sales="1000"/>
<Tank fuelItem="Diesel" Sales="5000"/>
</FuelTankList>
</EnterpriseDocument>
Hope this helps. It iterates over every fueltanklist, gets a list of tanks from it, retrieves its values and deletes them. Once we have the values and have operated on them, we add new tanks with processes values to the fueltanklist.
import lxml.etree as le
xml = """<EnterpriseDocument><FuelTankList><Tank fuelItem="Petrol" Sales="1000" />
<Tank fuelItem="Diesel" Sales="2000" />
<Tank fuelItem="Diesel" Sales="3000" />
</FuelTankList>
</EnterpriseDocument>"""
root = le.fromstring(xml)
#get all the fueltanklists from the file
fueltanklist = root.xpath('//FuelTankList')
for fuellist in fueltanklist:
tankdict={}
#get all the tanks in the current fueltanklist
tanks = fuellist.xpath('child::Tank')
for tank in tanks:
fuelitem = tank.attrib['fuelItem']
sales = tank.attrib['Sales']
if fuelitem in tankdict:
tankdict[fuelitem] += int(sales)
else:
tankdict[fuelitem] = int(sales)
#Once we have retrieved the value of the current tank, delete it from its parent
tank.getparent().remove(tank)
for key, value in tankdict.items():
#Create and add tanks with new values to its parent
newtank = le.Element("Tank", fuelItem=str(key), netSalesQty=str(value))
fuellist.append(newtank)
#Store the entire xml in a new string
newxml = le.tostring(root)
Try this:
from lxml import etree
# Parse the input XML file.
tree = etree.parse(open("so-input.xml"))
# Collect Tank element attributes here.
tanks = {}
# The FuelTankList element whose children we will change.
fuel_tank_list = None
# Loop over all Tank elements, collect their values, remove them.
for tank in tree.xpath("//Tank"):
# Get attributes.
fuel_item = tank.get("fuelItem")
sales = tank.get("Sales")
# Add to sales sum.
existing_sales = tanks.get(fuel_item, 0)
tanks[fuel_item] = existing_sales + int(sales)
# Remove <Tank>
fuel_tank_list = tank.getparent()
fuel_tank_list.remove(tank)
# Creat a new Tank element for each fuelItem value.
for fuel_item, sales in tanks.items():
new_tank = etree.Element("Tank")
new_tank.attrib["fuelItem"] = fuel_item
new_tank.attrib["Sales"] = str(sales)
fuel_tank_list.append(new_tank)
# Write the modified tree to a new file.
with open("so-output.xml", "wb") as f:
f.write(etree.tostring(tree, pretty_print=True))
Output of $ xmllint -format so-output.xml
:
<?xml version="1.0"?>
<EnterpriseDocument>
<FuelTankList>
<Tank fuelItem="Diesel" Sales="5000"/>
<Tank fuelItem="Petrol" Sales="1000"/>
</FuelTankList>
</EnterpriseDocument>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.