[英]How to find same attribute values in XML using lxml.etree?
I have an XML like the input below. 我有一个类似下面的输入的XML。
Here I want to aggregate Sales
values of the Diesel
fuel type. 在这里,我要汇总
Diesel
类型的“ Sales
值。
How can I iterate all <Tank>
elements and read the fuelItem
attribute to find more than one occurrences of same fuel type, and then sum up the Sales
attribute values? 如何迭代所有
<Tank>
元素并读取fuelItem
属性以查找多个出现的相同燃料类型,然后对Sales
属性值求和?
Input: 输入:
<EnterpriseDocument>
<FuelTankList>
<Tank fuelItem="Petrol" Sales="1000" />
<Tank fuelItem="Diesel" Sales="2000" />
<Tank fuelItem="Diesel" Sales="3000" />
</FuelTankList>
</EnterpriseDocument>
Preferred output: 首选输出:
<EnterpriseDocument>
<FuelTankList>
<Tank fuelItem="Petrol" netSalesQty="1000" />
<Tank fuelItem="Diesel" netSalesQty="5000" />
</FuelTankList>
</EnterpriseDocument>
Since you're using lxml, you could use XSLT and Muenchian Grouping to group Tank
elements by their fuelItem
attributes. 由于使用的是lxml,因此可以使用XSLT和Muenchian分组将
Tank
元素按其fuelItem
属性分组 。
Example... 例...
XML Input (input.xml) XML输入 (input.xml)
<EnterpriseDocument>
<FuelTankList>
<Tank fuelItem="Petrol" Sales="1000" />
<Tank fuelItem="Diesel" Sales="2000" />
<Tank fuelItem="Diesel" Sales="3000" />
</FuelTankList>
</EnterpriseDocument>
XSLT 1.0 (test.xsl) XSLT 1.0 (test.xsl)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="tanks" match="Tank" use="@fuelItem"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="FuelTankList">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:for-each select="Tank[count(.|key('tanks',@fuelItem)[1])=1]">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:attribute name="Sales">
<xsl:value-of select="sum(key('tanks',@fuelItem)/@Sales)"/>
</xsl:attribute>
</xsl:copy>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Python 蟒蛇
from lxml import etree
tree = etree.parse("input.xml")
xslt = etree.parse("test.xsl")
new_tree = tree.xslt(xslt)
print(etree.tostring(new_tree, pretty_print=True).decode("utf-8"))
Output (stdout) 输出 (标准输出)
<EnterpriseDocument>
<FuelTankList>
<Tank fuelItem="Petrol" Sales="1000"/>
<Tank fuelItem="Diesel" Sales="5000"/>
</FuelTankList>
</EnterpriseDocument>
Hope this helps. 希望这可以帮助。 It iterates over every fueltanklist, gets a list of tanks from it, retrieves its values and deletes them.
它遍历每个加油站列表,从中获取油箱列表,检索其值并删除它们。 Once we have the values and have operated on them, we add new tanks with processes values to the fueltanklist.
一旦有了值并对其进行了操作,我们便将带有过程值的新油箱添加到燃料箱清单中。
import lxml.etree as le
xml = """<EnterpriseDocument><FuelTankList><Tank fuelItem="Petrol" Sales="1000" />
<Tank fuelItem="Diesel" Sales="2000" />
<Tank fuelItem="Diesel" Sales="3000" />
</FuelTankList>
</EnterpriseDocument>"""
root = le.fromstring(xml)
#get all the fueltanklists from the file
fueltanklist = root.xpath('//FuelTankList')
for fuellist in fueltanklist:
tankdict={}
#get all the tanks in the current fueltanklist
tanks = fuellist.xpath('child::Tank')
for tank in tanks:
fuelitem = tank.attrib['fuelItem']
sales = tank.attrib['Sales']
if fuelitem in tankdict:
tankdict[fuelitem] += int(sales)
else:
tankdict[fuelitem] = int(sales)
#Once we have retrieved the value of the current tank, delete it from its parent
tank.getparent().remove(tank)
for key, value in tankdict.items():
#Create and add tanks with new values to its parent
newtank = le.Element("Tank", fuelItem=str(key), netSalesQty=str(value))
fuellist.append(newtank)
#Store the entire xml in a new string
newxml = le.tostring(root)
Try this: 尝试这个:
from lxml import etree
# Parse the input XML file.
tree = etree.parse(open("so-input.xml"))
# Collect Tank element attributes here.
tanks = {}
# The FuelTankList element whose children we will change.
fuel_tank_list = None
# Loop over all Tank elements, collect their values, remove them.
for tank in tree.xpath("//Tank"):
# Get attributes.
fuel_item = tank.get("fuelItem")
sales = tank.get("Sales")
# Add to sales sum.
existing_sales = tanks.get(fuel_item, 0)
tanks[fuel_item] = existing_sales + int(sales)
# Remove <Tank>
fuel_tank_list = tank.getparent()
fuel_tank_list.remove(tank)
# Creat a new Tank element for each fuelItem value.
for fuel_item, sales in tanks.items():
new_tank = etree.Element("Tank")
new_tank.attrib["fuelItem"] = fuel_item
new_tank.attrib["Sales"] = str(sales)
fuel_tank_list.append(new_tank)
# Write the modified tree to a new file.
with open("so-output.xml", "wb") as f:
f.write(etree.tostring(tree, pretty_print=True))
Output of $ xmllint -format so-output.xml
: $ xmllint -format so-output.xml
输出:
<?xml version="1.0"?>
<EnterpriseDocument>
<FuelTankList>
<Tank fuelItem="Diesel" Sales="5000"/>
<Tank fuelItem="Petrol" Sales="1000"/>
</FuelTankList>
</EnterpriseDocument>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.