[英]How to find same attribute values in XML using lxml.etree?
我有一个类似下面的输入的XML。
在这里,我要汇总Diesel
类型的“ Sales
值。
如何迭代所有<Tank>
元素并读取fuelItem
属性以查找多个出现的相同燃料类型,然后对Sales
属性值求和?
输入:
<EnterpriseDocument>
<FuelTankList>
<Tank fuelItem="Petrol" Sales="1000" />
<Tank fuelItem="Diesel" Sales="2000" />
<Tank fuelItem="Diesel" Sales="3000" />
</FuelTankList>
</EnterpriseDocument>
首选输出:
<EnterpriseDocument>
<FuelTankList>
<Tank fuelItem="Petrol" netSalesQty="1000" />
<Tank fuelItem="Diesel" netSalesQty="5000" />
</FuelTankList>
</EnterpriseDocument>
由于使用的是lxml,因此可以使用XSLT和Muenchian分组将Tank
元素按其fuelItem
属性分组 。
例...
XML输入 (input.xml)
<EnterpriseDocument>
<FuelTankList>
<Tank fuelItem="Petrol" Sales="1000" />
<Tank fuelItem="Diesel" Sales="2000" />
<Tank fuelItem="Diesel" Sales="3000" />
</FuelTankList>
</EnterpriseDocument>
XSLT 1.0 (test.xsl)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="tanks" match="Tank" use="@fuelItem"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="FuelTankList">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:for-each select="Tank[count(.|key('tanks',@fuelItem)[1])=1]">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:attribute name="Sales">
<xsl:value-of select="sum(key('tanks',@fuelItem)/@Sales)"/>
</xsl:attribute>
</xsl:copy>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
蟒蛇
from lxml import etree
tree = etree.parse("input.xml")
xslt = etree.parse("test.xsl")
new_tree = tree.xslt(xslt)
print(etree.tostring(new_tree, pretty_print=True).decode("utf-8"))
输出 (标准输出)
<EnterpriseDocument>
<FuelTankList>
<Tank fuelItem="Petrol" Sales="1000"/>
<Tank fuelItem="Diesel" Sales="5000"/>
</FuelTankList>
</EnterpriseDocument>
希望这可以帮助。 它遍历每个加油站列表,从中获取油箱列表,检索其值并删除它们。 一旦有了值并对其进行了操作,我们便将带有过程值的新油箱添加到燃料箱清单中。
import lxml.etree as le
xml = """<EnterpriseDocument><FuelTankList><Tank fuelItem="Petrol" Sales="1000" />
<Tank fuelItem="Diesel" Sales="2000" />
<Tank fuelItem="Diesel" Sales="3000" />
</FuelTankList>
</EnterpriseDocument>"""
root = le.fromstring(xml)
#get all the fueltanklists from the file
fueltanklist = root.xpath('//FuelTankList')
for fuellist in fueltanklist:
tankdict={}
#get all the tanks in the current fueltanklist
tanks = fuellist.xpath('child::Tank')
for tank in tanks:
fuelitem = tank.attrib['fuelItem']
sales = tank.attrib['Sales']
if fuelitem in tankdict:
tankdict[fuelitem] += int(sales)
else:
tankdict[fuelitem] = int(sales)
#Once we have retrieved the value of the current tank, delete it from its parent
tank.getparent().remove(tank)
for key, value in tankdict.items():
#Create and add tanks with new values to its parent
newtank = le.Element("Tank", fuelItem=str(key), netSalesQty=str(value))
fuellist.append(newtank)
#Store the entire xml in a new string
newxml = le.tostring(root)
尝试这个:
from lxml import etree
# Parse the input XML file.
tree = etree.parse(open("so-input.xml"))
# Collect Tank element attributes here.
tanks = {}
# The FuelTankList element whose children we will change.
fuel_tank_list = None
# Loop over all Tank elements, collect their values, remove them.
for tank in tree.xpath("//Tank"):
# Get attributes.
fuel_item = tank.get("fuelItem")
sales = tank.get("Sales")
# Add to sales sum.
existing_sales = tanks.get(fuel_item, 0)
tanks[fuel_item] = existing_sales + int(sales)
# Remove <Tank>
fuel_tank_list = tank.getparent()
fuel_tank_list.remove(tank)
# Creat a new Tank element for each fuelItem value.
for fuel_item, sales in tanks.items():
new_tank = etree.Element("Tank")
new_tank.attrib["fuelItem"] = fuel_item
new_tank.attrib["Sales"] = str(sales)
fuel_tank_list.append(new_tank)
# Write the modified tree to a new file.
with open("so-output.xml", "wb") as f:
f.write(etree.tostring(tree, pretty_print=True))
$ xmllint -format so-output.xml
输出:
<?xml version="1.0"?>
<EnterpriseDocument>
<FuelTankList>
<Tank fuelItem="Diesel" Sales="5000"/>
<Tank fuelItem="Petrol" Sales="1000"/>
</FuelTankList>
</EnterpriseDocument>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.