Python ElementTree : partially parsing a large file

Question

I have a large XML file, which is roughly structured (in this order) :

<document>
   <interesting_part>
     ...
   </interesting_part>
   <foo>
     ...
     60000 lines
     ...
   </foo>
</document>

My program is :

from xml.etree import ElementTree as et
f=open(path_f)
tree=et.parse(f)
f.close()
# retreive infos from tree...

Only the first few block interests me in the file, but performance is low because et.parse() loads the whole file.

How to load the file only till < / interesting_part > ?

I thought of something like :

class My_Parser(et.XMLParser):
    ????
my_parser = My_Parser()
tree=et.parse(f, my_parser)

Thanking you by advance, Eric.

Answer 1

Use the iterparse() function instead, and simply stop iterating when you have what you want:

for event, element in et.iterparse(f):
    if element.tag == 'interesting_part':
        # `element` is the complete <interesting_part> element, with children
        # process it
        break  # ends parsing.

Python ElementTree : partially parsing a large file

Question

1 answers

solution1
6 ACCPTED 2013-05-28 12:59:52

Python ElementTree : partially parsing a large file

Question

1 answers

solution1 6 ACCPTED 2013-05-28 12:59:52

solution1
6 ACCPTED 2013-05-28 12:59:52