简体   繁体   中英

Get XML Parent ID and all Childs underneath using ElementTree and python

I would like to obtain some information from XML.

  1. zone id if the type-v2="text"
  2. All Childs under this particular Zone hidden in element.

This is the XML code:

<dashboards>
  <dashboard name="1 off">
   <style/>
   <size maxheight="800" maxwidth="1000" minheight="800" minwidth="1000"/>
   <zones>
    <zone h="100000" id="2" type-v2="layout-basic" w="100000" x="0" y="0">
     <zone forceUpdate="true" h="98000" id="1" type-v2="text" w="49200" x="800" y="1000">
      <formatted-text>
       <run fontsize="1">
        row 1
       </run>
       <run>
        Æ
       </run>
       <run fontsize="2">
        row 2
       </run>
      </formatted-text>
      <zone-style>
       <format attr="border-color" value="#000000"/>
       <format attr="border-style" value="none"/>
       <format attr="border-width" value="0"/>
       <format attr="margin" value="4"/>
      </zone-style>
     </zone>
     <zone h="98000" id="3" type-v2="text" w="49200" x="50000" y="1000">
      <formatted-text>
       <run>
        id2
       </run>
      </formatted-text>
      <zone-style>
       <format attr="border-color" value="#000000"/>
       <format attr="border-style" value="none"/>
       <format attr="border-width" value="0"/>
       <format attr="margin" value="4"/>
      </zone-style>
     </zone>
     <zone-style>
      <format attr="border-color" value="#000000"/>
      <format attr="border-style" value="none"/>
      <format attr="border-width" value="0"/>
      <format attr="margin" value="8"/>
     </zone-style>
    </zone>
   </zones>
  </dashboard>
</dashboards>

I am able to obtain all the IDs and all details in run separately but i would like to get different output. So for every element i would like to assign parent id to it.

This is my code get the information separately:

import xml.etree.ElementTree as et

tree = et.parse(r'C:\book3.twb')
root = tree.getroot()
dbnameRC=[]
fontRC = []
sizeRC = []
weightRC = []
f_styleRC = []
decoRC = []
colorRC= []
alignRC = []
textcontRC=[]
idRC=[]

for db in root.iter("dashboard"):
    root1 = db    
    
    for z in root1.findall(".//zone[@type-v2='text']"):
        idRC.append(z.get('id'))

    for m in root1.findall(".//zone[@type-v2='text']/formatted-text//run"):
        weightRC.append(m.get('bold'))
        alignRC.append(m.get('fontalignment'))
        colorRC.append(m.get('fontcolor'))
        fontRC.append(m.get('fontname'))
        sizeRC.append(m.get('fontsize'))
        f_styleRC.append(m.get('italic'))
        decoRC.append(m.get('underline'))
        dbnameRC.append(db.attrib['name'])
        textcontRC.append(m.text)
        

1.the output for idRC is:

 ['1', '3'] 

which is correct because we have only two ids for type-v2='text']

  1. the output for sizeRC is
['1', None, '2', None]

which is also correct.

The question is how to write a code to give as an output like this:

在此处输入图像描述

Basically all i want to do is enter the zone with type-v2="text" take its id and take all runs undernath and assign it to this particular id and than move to another zone with different id and
type-v2="text" and take all runs under this zone.

Instead of your 2nd root1.findall() you can zone.findall() instead - allowing you to group the id with each run.

runs = []

for db in root.iter("dashboard"):
    root1 = db

    for zone in root1.findall(".//zone[@type-v2='text']"):
        idrc = zone.get('id')

        for run in zone.findall("./formatted-text//run"):
            runs.append([
                idrc,
                run.get("bold"),
                run.get("fontalignment"),
                run.get("fontcolor"),
                run.get("fontname"),
                run.get("fontsize"),
                run.get("italic"),
                run.get("underline"),
                db.attrib["name"],
                run.text
            ])

Output:

>>> runs
[['1', None, None, None, None, '1', None, None, '1 off', 'row 1'],
 ['1', None, None, None, None, None, None, None, '1 off', 'Æ'],
 ['1', None, None, None, None, '2', None, None, '1 off', 'row 2'],
 ['3', None, None, None, None, None, None, None, '1 off', 'id2']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM