简体   繁体   中英

How to find same attribute values in XML using lxml.etree?

I have an XML like the input below.

Here I want to aggregate Sales values of the Diesel fuel type.

How can I iterate all <Tank> elements and read the fuelItem attribute to find more than one occurrences of same fuel type, and then sum up the Sales attribute values?

Input:

 <EnterpriseDocument>
      <FuelTankList>
        <Tank fuelItem="Petrol" Sales="1000" />
        <Tank  fuelItem="Diesel" Sales="2000" />
        <Tank  fuelItem="Diesel" Sales="3000" />
      </FuelTankList>
    </EnterpriseDocument>

Preferred output:

<EnterpriseDocument>
  <FuelTankList>
    <Tank fuelItem="Petrol" netSalesQty="1000" />
    <Tank  fuelItem="Diesel" netSalesQty="5000" />
  </FuelTankList>
</EnterpriseDocument>

Since you're using lxml, you could use XSLT and Muenchian Grouping to group Tank elements by their fuelItem attributes.

Example...

XML Input (input.xml)

<EnterpriseDocument>
    <FuelTankList>
        <Tank fuelItem="Petrol" Sales="1000" />
        <Tank  fuelItem="Diesel" Sales="2000" />
        <Tank  fuelItem="Diesel" Sales="3000" />
    </FuelTankList>
</EnterpriseDocument>

XSLT 1.0 (test.xsl)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:key name="tanks" match="Tank" use="@fuelItem"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="FuelTankList">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:for-each select="Tank[count(.|key('tanks',@fuelItem)[1])=1]">
        <xsl:copy>
          <xsl:apply-templates select="@*"/>
          <xsl:attribute name="Sales">
            <xsl:value-of select="sum(key('tanks',@fuelItem)/@Sales)"/>
          </xsl:attribute>
        </xsl:copy>
      </xsl:for-each>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Python

from lxml import etree

tree = etree.parse("input.xml")
xslt = etree.parse("test.xsl")

new_tree = tree.xslt(xslt)

print(etree.tostring(new_tree, pretty_print=True).decode("utf-8"))

Output (stdout)

<EnterpriseDocument>
  <FuelTankList>
    <Tank fuelItem="Petrol" Sales="1000"/>
    <Tank fuelItem="Diesel" Sales="5000"/>
  </FuelTankList>
</EnterpriseDocument>

Hope this helps. It iterates over every fueltanklist, gets a list of tanks from it, retrieves its values and deletes them. Once we have the values and have operated on them, we add new tanks with processes values to the fueltanklist.

import lxml.etree as le

xml = """<EnterpriseDocument><FuelTankList><Tank fuelItem="Petrol" Sales="1000" />
        <Tank  fuelItem="Diesel" Sales="2000" />
        <Tank  fuelItem="Diesel" Sales="3000" />
      </FuelTankList>
    </EnterpriseDocument>"""

root = le.fromstring(xml)

#get all the fueltanklists from the file

fueltanklist = root.xpath('//FuelTankList')
for fuellist in fueltanklist:
    tankdict={}
    #get all the tanks in the current fueltanklist

    tanks = fuellist.xpath('child::Tank')
    for tank in tanks:
        fuelitem = tank.attrib['fuelItem']
        sales = tank.attrib['Sales']
        if fuelitem in tankdict:
            tankdict[fuelitem] += int(sales)
        else:
            tankdict[fuelitem] = int(sales)

        #Once we have retrieved the value of the current tank, delete it from its parent

        tank.getparent().remove(tank)
    for key, value in tankdict.items():
        #Create and add tanks with new values to its parent
        newtank = le.Element("Tank", fuelItem=str(key), netSalesQty=str(value))
        fuellist.append(newtank)

#Store the entire xml in a new string

newxml = le.tostring(root)

Try this:

from lxml import etree

# Parse the input XML file.
tree = etree.parse(open("so-input.xml"))

# Collect Tank element attributes here.
tanks = {}

# The FuelTankList element whose children we will change.
fuel_tank_list = None

# Loop over all Tank elements, collect their values, remove them.
for tank in tree.xpath("//Tank"):
    # Get attributes.
    fuel_item = tank.get("fuelItem")
    sales = tank.get("Sales")

    # Add to sales sum.
    existing_sales = tanks.get(fuel_item, 0)
    tanks[fuel_item] = existing_sales + int(sales)

    # Remove <Tank>
    fuel_tank_list = tank.getparent()
    fuel_tank_list.remove(tank)

# Creat a new Tank element for each fuelItem value.
for fuel_item, sales in tanks.items():
    new_tank = etree.Element("Tank")
    new_tank.attrib["fuelItem"] = fuel_item
    new_tank.attrib["Sales"] = str(sales)
    fuel_tank_list.append(new_tank)

# Write the modified tree to a new file.
with open("so-output.xml", "wb") as f:
    f.write(etree.tostring(tree, pretty_print=True))

Output of $ xmllint -format so-output.xml :

<?xml version="1.0"?>
<EnterpriseDocument>
  <FuelTankList>
    <Tank fuelItem="Diesel" Sales="5000"/>
    <Tank fuelItem="Petrol" Sales="1000"/>
  </FuelTankList>
</EnterpriseDocument>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM