简体   繁体   中英

Parsing an arbitrary XML file with ElementTree

I have a template XML file, and based on inputs given to my program I have to generate a new XML file. The template has sections that need to be repeated based on the input data. But I don't necessarily know the structure of these sections or how many levels of nesting they have. I cannot figure out how to read in the template file in an arbitrary way they will let me populate it and then output it. Here is a section of the template file:

<Target_Table>
  <Target_Name>SF1_T1</Target_Name>
  <Target_Mode>
    <REP>
      <Target_Location_To_Repeat>
        <XLocation>nextXREL</XLocation>
        <YLocation>nextYREL</YLocation>
      </Target_Location_To_Repeat>
   <Target_Location_To_Repeat>
        <XLocation>nextXREL</XLocation>
        <YLocation>nextYREL</YLocation>
      </Target_Location_To_Repeat>
    </REP>
  </Target_Mode>
  <Target_Repetitions>1</Target_Repetitions>
  <Meas_Window>
    <Window_Size>
      <XLocation>FOV</XLocation>
      <YLocation>FOV</YLocation>
    </Window_Size>
    <Window_Location>
      <XLocation>firstXREL</XLocation>
      <YLocation>firstYREL</YLocation>
    </Window_Location>
  </Meas_Window>
  <Box_Orientation>90</Box_Orientation>
  <First_Feature Value="Space" />
  <Meas_Params_Definition>
    <Number_Of_Lines Value="Auto" />
    <Number_Of_Pixels_Per_Line Value="Auto" />
    <Averaging_Factor Value="1" />
  </Meas_Params_Definition>
  <Number_Of_Edges>1</Number_Of_Edges>
  <Edge_Pair>
    <Edge_Pair_Couple>
      <First_Edge>1</First_Edge>
      <Second_Edge>1</Second_Edge>
    </Edge_Pair_Couple>
    <Nominal_Corrected_Value>0</Nominal_Corrected_Value>
  </Edge_Pair>
  <Categories>
    <Material_Type />
    <Meas_Type />
    <Category_Type />
    <Other_Type />
  </Categories>
  <Bias>0</Bias>
  <Template_Target_Name>SF_IMAQ_Template_Target</Template_Target_Name>
  <Template_Target_PPL>
    <Process>PC2</Process>
    <Product>PD2</Product>
    <Layer>L2</Layer>
  </Template_Target_PPL>
  <Meas_Auto_Box>
    <Error_Code>0</Error_Code>
    <Measured_CD>0</Measured_CD>
    <Constant_NM2Pix>true</Constant_NM2Pix>
  </Meas_Auto_Box>
  <Meas_Box_Pix_Size_X>PixelSize</Meas_Box_Pix_Size_X>
  <Macro_CD>0</Macro_CD>
</Target_Table>

I need to repeat the entire Target_Table section multiple time, and within each Target_Table I need to repeat the REP section multiple times. I want to write my program so that if the template changes (eg, more levels of nesting are added) I don't have to change my program. But it seems to me that I have to totally know the structure of the file to read it in and spit it out. Is that true or am I missing something here? Is there a way to write a program that will read in a file with an unknown tags and unknown levels of nesting?

Using ElementTree:

import xml.etree.ElementTree as et

filehandler = open("file.xml","r")
raw_data = et.parse(filehandler)
data_root = raw_data.getroot()
filehandler.close()

for children in data_root:
    for child in children:
        print(child.tag, child.text, children.tag, children.text)

That will give you an overview of the XML-tags and associated text inside tags. You can add more loops to step further into the tree, and perform checks to see wether any of the children contains further levels. I find this method useful when the name of the XML tags varies and does not follow an already known standard.

An example using BeautifulSoup :

import sys 
from bs4 import BeautifulSoup

file = sys.argv[1]
handler = open(file).read()
soup = BeautifulSoup(handler)

for table in soup.find_all("target_table"):
  for loc in table.find_all("rep"):
    print loc.xlocation.string + ", " + loc.ylocation.string

Output

nextXREL, nextYREL

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM