I have been trying to parse data from a xml file for several days now and I can't get it to work. From the example below I need per layer status, index and from under foreground/producer type and filename. The problem is that the structure is different depending on the content. Look at index 2 where filename is under foreground/producer/fill/producer (I do not need the filenamne under foreground/producer/key/producer). I'm looking for a simple solution (have been trying with etree.ElementTree but parsing seems so difficult).
<?xml version="1.0" encoding="utf-8"?>
<channel>
<video-mode>1080i5000</video-mode>
<stage>
<layers>
<layer>
<status>stopped</status>
<auto_delta>-1</auto_delta>
<frame-number>1829997</frame-number>
<nb_frames>0</nb_frames>
<frames-left>-1829996</frames-left>
<foreground>
<producer>
<type>empty-producer</type>
</producer>
</foreground>
<background>
<producer>
<type>transition-producer</type>
<source>
<producer>
<type>empty-producer</type>
</producer>
</source>
<destination>
<producer>
<type>ffmpeg-producer</type>
<filename>media\\MULTI\testfile2.mpg</filename>
<width>1920</width>
<height>1080</height>
<progressive>true</progressive>
<fps>25</fps>
<loop>false</loop>
<frame-number>0</frame-number>
<nb-frames>4396</nb-frames>
<file-frame-number>0</file-frame-number>
<file-nb-frames>4396</file-nb-frames>
</producer>
</destination>
</producer>
</background>
<index>0</index>
</layer>
<layer>
<status>playing</status>
<auto_delta>-1</auto_delta>
<frame-number>1830920</frame-number>
<nb_frames>4294967295</nb_frames>
<frames-left>4293136376</frames-left>
<foreground>
<producer>
<type>ffmpeg-producer</type>
<filename>media\AMB.mp4</filename>
<width>720</width>
<height>576</height>
<progressive>true</progressive>
<fps>25</fps>
<loop>true</loop>
<frame-number>1830920</frame-number>
<nb-frames>4294967295</nb-frames>
<file-frame-number>520</file-frame-number>
<file-nb-frames>1600</file-nb-frames>
</producer>
</foreground>
<background>
<producer>
<type>empty-producer</type>
</producer>
</background>
<index>1</index>
</layer>
<layer>
<status>playing</status>
<auto_delta>-1</auto_delta>
<frame-number>1830758</frame-number>
<nb_frames>4294967295</nb_frames>
<frames-left>4293136538</frames-left>
<foreground>
<producer>
<type>separated-producer</type>
<fill>
<producer>
<type>ffmpeg-producer</type>
<filename>media\action.mpg</filename>
<width>1920</width>
<height>1080</height>
<progressive>false</progressive>
<fps>25</fps>
<loop>true</loop>
<frame-number>1830758</frame-number>
<nb-frames>4294967295</nb-frames>
<file-frame-number>22</file-frame-number>
<file-nb-frames>247</file-nb-frames>
</producer>
</fill>
<key>
<producer>
<type>ffmpeg-producer</type>
<filename>media\action_a.mpg</filename>
<width>1920</width>
<height>1080</height>
<progressive>false</progressive>
<fps>25</fps>
<loop>true</loop>
<frame-number>1830758</frame-number>
<nb-frames>4294967295</nb-frames>
<file-frame-number>22</file-frame-number>
<file-nb-frames>247</file-nb-frames>
</producer>
</key>
</producer>
</foreground>
<background>
<producer>
<type>empty-producer</type>
</producer>
</background>
<index>2</index>
</layer>
</layers>
</stage>
<mixer/>
<output>
<consumers>
<consumer>
<type>oal-consumer</type>
<index>500</index>
</consumer>
<consumer>
<type>ogl-consumer</type>
<key-only>false</key-only>
<windowed>true</windowed>
<auto-deinterlace>true</auto-deinterlace>
<index>600</index>
</consumer>
</consumers>
</output>
<index>0</index>
</channel>
import xml.etree.ElementTree as ET
tree = ET.parse('x.xml')
root = tree.getroot()
for child in root:
print child.tag
for child2 in child:
print '> ',child2.tag
'''
====
output
====
video-mode
stage
> layers
mixer
output
> consumers
index
'''
With regards to the problem: "that the structure is different depending on the content." Every XML is define with regards to a definition, the DTD. The structure of a file can't change internally, otherwise it would be ill-defined. If what you mean is, you want to parse parts of the tree depending on leafs above the node, you will have to come up with some if then else statements and functions, for example such as so:
import xml.etree.ElementTree as ET
tree = ET.parse('x.xml')
root = tree.getroot()
def parseStageTag(element):
print 'parsing Stage'
for child in element:
if child.tag=='layers':
parseLayersTag(child)
def parseOutputTag(element):
pass
def parseLayersTag(element):
print 'parsing Layers'
for child in element:
print child
for child in root:
if child.tag=='stage':
parseStageTag(child)
for child2 in child:
print '> ',child2.tag
'''
output
parsing Stage
parsing Layers
<Element 'layer' at 0x1079e4250>
<Element 'layer' at 0x1079e4f10>
<Element 'layer' at 0x1079e6510>
> layers
> consumers
'''
I've found similar issues parsing XML files, until I discovered ElementTree's support for XPath
For example, the following code:
import os
import xml.etree.ElementTree
os.chdir('C:/temp/blah')
et = xml.etree.ElementTree.parse('file.xml')
layerTagList = et.findall("./stage/layers/layer")
for curLayerTag in layerTagList:
indexTag = curLayerTag.find("./index")
print "Layer[%s]" %(indexTag.text)
fgFiles = curLayerTag.findall(".//foreground//filename")
for fileTag in fgFiles:
print " FG - %s" %(fileTag.text)
bgFiles = curLayerTag.findall(".//background//filename")
for fileTag in bgFiles:
print " BG - %s" %(fileTag.text)
gives the output:
Layer[0]
BG - media\\MULTI\testfile2.mpg
Layer[1]
FG - media\AMB.mp4
Layer[2]
FG - media\action.mpg
FG - media\action_a.mpg
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.