简体   繁体   中英

How to parse tree using lxml iterparse

This is part of my xml starting from some part:

<bigchapter>
     <chapter id="a" name="x">
      <valueimportant v="valuetoget1"/>
      <TimeSeries>
        <TimeSeriesIdentification v="1"/>
        <type v="a1"/>
        <Period>
          <Interval>
            <Pos v="1"/>
            <Qty v="26"/>
          </Interval>
          <Interval>
            <Pos v="2"/>
            <Qty v="26"/>
          </Interval>
        </Period>
      </TimeSeries>
      <TimeSeries>
        <type v="b1"/>
        <Period>
          <Interval>
            <Pos v="1"/>
            <Qty v="26"/>
          </Interval>
          <Interval>
            <Pos v="2"/>
            <Qty v="26"/>
          </Interval>
        </Period>
      </TimeSeries>
     </chapter>
     <chapter id="a" name="x">
      <valueimportant v="valuetoget2"/>
      <TimeSeries>
        <TimeSeriesIdentification v="1"/>
        <type v="a1"/>
        <Period>
          <Interval>
            <Pos v="1"/>
            <Qty v="154"/>
          </Interval>
          <Interval>
            <Pos v="2"/>
            <Qty v="126"/>
          </Interval>
        </Period>
      </TimeSeries>
      <TimeSeries>
        <type v="b1"/>
        <Period>
          <Interval>
            <Pos v="1"/>
            <Qty v="137"/>
          </Interval>
          <Interval>
            <Pos v="2"/>
            <Qty v="148"/>
          </Interval>
        </Period>
      </TimeSeries>
     </chapter>
</bigchapter>

What I want is to create a dictionary with valueimportant as a key and as a value another dictionary with types as keys and dictionary with keys as Pos and Qty as values.

In return I will be getting:

{valuetoget1: {a1:{1: 26, 2:26}, b1: {1:26, 2:26}}, valuetoget2: {a1:{1:154, 2:126}, b1:{1:137,2:148}}

I also have some xml before this part of xml, which is irrelevant, I tried this way I am getting the first part of my dictionary, which is keys, but I do not know how to proceed I would be grateful to use lxml etree

result={}
context = etree.iterparse(file_obj,
                          events=("end",))
for event, elem in context:
    try:
        if elem.tag == 'chapter':
            valueimportant = elem.find('valueimportant')
            if valueimportant.attrib['v'] not in result.keys():
                result[valueimportant.attrib['v']] = {}

    except IndexError or KeyError or ValueError:
        print('error')

I'm not sure that you need lxml there, built-in ElementTree functionality should be enough. The main task is to collect data, so just iterate over root node processing each <chapter> separately, find <valueimportant> node with v attribute, then iterate over <Period> node and find <Pos> and <Qty> nodes with v attributes.

Code:

import xml.etree.ElementTree as ET

xml = ET.parse("file.xml")
root = xml.getroot()

result = {}
for chapter in root:  # root.iterfind(".//chapter")
    valueimportant = chapter.find("./valueimportant[@v]")
    if valueimportant is not None:
        period = chapter.find("./TimeSeries/Period")  # chapter.find(".//Period") 
        if period is not None:
            values = {}
            for interval in period:  # period.iterfind(".//Interval")
                pos = interval.find("./Pos[@v]")
                qty = interval.find("./Qty[@v]")
                if pos is not None and qty is not None:
                    values[pos.attrib["v"]] = qty.attrib["v"]
            result[valueimportant.attrib["v"]] = values

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM