简体   繁体   中英

Python ElementTree : partially parsing a large file

I have a large XML file, which is roughly structured (in this order) :

<document>
   <interesting_part>
     ...
   </interesting_part>
   <foo>
     ...
     60000 lines
     ...
   </foo>
</document>

My program is :

from xml.etree import ElementTree as et
f=open(path_f)
tree=et.parse(f)
f.close()
# retreive infos from tree...

Only the first few block interests me in the file, but performance is low because et.parse() loads the whole file.

How to load the file only till < / interesting_part > ?

I thought of something like :

class My_Parser(et.XMLParser):
    ????
my_parser = My_Parser()
tree=et.parse(f, my_parser)

Thanking you by advance, Eric.

Use the iterparse() function instead, and simply stop iterating when you have what you want:

for event, element in et.iterparse(f):
    if element.tag == 'interesting_part':
        # `element` is the complete <interesting_part> element, with children
        # process it
        break  # ends parsing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM