简体   繁体   English

如何使用lxml.etree在XML中找到相同的属性值?

[英]How to find same attribute values in XML using lxml.etree?

I have an XML like the input below. 我有一个类似下面的输入的XML。

Here I want to aggregate Sales values of the Diesel fuel type. 在这里,我要汇总Diesel类型的“ Sales值。

How can I iterate all <Tank> elements and read the fuelItem attribute to find more than one occurrences of same fuel type, and then sum up the Sales attribute values? 如何迭代所有<Tank>元素并读取fuelItem属性以查找多个出现的相同燃料类型,然后对Sales属性值求和?

Input: 输入:

 <EnterpriseDocument>
      <FuelTankList>
        <Tank fuelItem="Petrol" Sales="1000" />
        <Tank  fuelItem="Diesel" Sales="2000" />
        <Tank  fuelItem="Diesel" Sales="3000" />
      </FuelTankList>
    </EnterpriseDocument>

Preferred output: 首选输出:

<EnterpriseDocument>
  <FuelTankList>
    <Tank fuelItem="Petrol" netSalesQty="1000" />
    <Tank  fuelItem="Diesel" netSalesQty="5000" />
  </FuelTankList>
</EnterpriseDocument>

Since you're using lxml, you could use XSLT and Muenchian Grouping to group Tank elements by their fuelItem attributes. 由于使用的是lxml,因此可以使用XSLT和Muenchian分组Tank元素按其fuelItem属性分组

Example... 例...

XML Input (input.xml) XML输入 (input.xml)

<EnterpriseDocument>
    <FuelTankList>
        <Tank fuelItem="Petrol" Sales="1000" />
        <Tank  fuelItem="Diesel" Sales="2000" />
        <Tank  fuelItem="Diesel" Sales="3000" />
    </FuelTankList>
</EnterpriseDocument>

XSLT 1.0 (test.xsl) XSLT 1.0 (test.xsl)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:key name="tanks" match="Tank" use="@fuelItem"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="FuelTankList">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:for-each select="Tank[count(.|key('tanks',@fuelItem)[1])=1]">
        <xsl:copy>
          <xsl:apply-templates select="@*"/>
          <xsl:attribute name="Sales">
            <xsl:value-of select="sum(key('tanks',@fuelItem)/@Sales)"/>
          </xsl:attribute>
        </xsl:copy>
      </xsl:for-each>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Python 蟒蛇

from lxml import etree

tree = etree.parse("input.xml")
xslt = etree.parse("test.xsl")

new_tree = tree.xslt(xslt)

print(etree.tostring(new_tree, pretty_print=True).decode("utf-8"))

Output (stdout) 输出 (标准输出)

<EnterpriseDocument>
  <FuelTankList>
    <Tank fuelItem="Petrol" Sales="1000"/>
    <Tank fuelItem="Diesel" Sales="5000"/>
  </FuelTankList>
</EnterpriseDocument>

Hope this helps. 希望这可以帮助。 It iterates over every fueltanklist, gets a list of tanks from it, retrieves its values and deletes them. 它遍历每个加油站列表,从中获取油箱列表,检索其值并删除它们。 Once we have the values and have operated on them, we add new tanks with processes values to the fueltanklist. 一旦有了值并对其进行了操作,我们便将带有过程值的新油箱添加到燃料箱清单中。

import lxml.etree as le

xml = """<EnterpriseDocument><FuelTankList><Tank fuelItem="Petrol" Sales="1000" />
        <Tank  fuelItem="Diesel" Sales="2000" />
        <Tank  fuelItem="Diesel" Sales="3000" />
      </FuelTankList>
    </EnterpriseDocument>"""

root = le.fromstring(xml)

#get all the fueltanklists from the file

fueltanklist = root.xpath('//FuelTankList')
for fuellist in fueltanklist:
    tankdict={}
    #get all the tanks in the current fueltanklist

    tanks = fuellist.xpath('child::Tank')
    for tank in tanks:
        fuelitem = tank.attrib['fuelItem']
        sales = tank.attrib['Sales']
        if fuelitem in tankdict:
            tankdict[fuelitem] += int(sales)
        else:
            tankdict[fuelitem] = int(sales)

        #Once we have retrieved the value of the current tank, delete it from its parent

        tank.getparent().remove(tank)
    for key, value in tankdict.items():
        #Create and add tanks with new values to its parent
        newtank = le.Element("Tank", fuelItem=str(key), netSalesQty=str(value))
        fuellist.append(newtank)

#Store the entire xml in a new string

newxml = le.tostring(root)

Try this: 尝试这个:

from lxml import etree

# Parse the input XML file.
tree = etree.parse(open("so-input.xml"))

# Collect Tank element attributes here.
tanks = {}

# The FuelTankList element whose children we will change.
fuel_tank_list = None

# Loop over all Tank elements, collect their values, remove them.
for tank in tree.xpath("//Tank"):
    # Get attributes.
    fuel_item = tank.get("fuelItem")
    sales = tank.get("Sales")

    # Add to sales sum.
    existing_sales = tanks.get(fuel_item, 0)
    tanks[fuel_item] = existing_sales + int(sales)

    # Remove <Tank>
    fuel_tank_list = tank.getparent()
    fuel_tank_list.remove(tank)

# Creat a new Tank element for each fuelItem value.
for fuel_item, sales in tanks.items():
    new_tank = etree.Element("Tank")
    new_tank.attrib["fuelItem"] = fuel_item
    new_tank.attrib["Sales"] = str(sales)
    fuel_tank_list.append(new_tank)

# Write the modified tree to a new file.
with open("so-output.xml", "wb") as f:
    f.write(etree.tostring(tree, pretty_print=True))

Output of $ xmllint -format so-output.xml : $ xmllint -format so-output.xml输出:

<?xml version="1.0"?>
<EnterpriseDocument>
  <FuelTankList>
    <Tank fuelItem="Diesel" Sales="5000"/>
    <Tank fuelItem="Petrol" Sales="1000"/>
  </FuelTankList>
</EnterpriseDocument>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM