简体   繁体   中英

python lxml iterparse fails on large files containing namespaces

I'm tryint to parse large file (>100mb) as described at http://effbot.org/zone/element-iterparse.htm#incremental-parsing

But if file contains namespaces, lxml fails with error

lxml.etree.XMLSyntaxError: Namespace default prefix was not found

It works fine if I remove elem.clear(), but uses a lot of memory. Example of xml file

<?xml version="1.0" encoding="utf-8" ?>
<feed xmlns="NS">
  <offer>
    <type>type1</type>
    <name>name1</name>
  </offer>
</feed>

lxml version is 3.2.0, because new versions segfaults after end of parsing

Did you read this ? In my experience with 100MB+ files you are in over 2GB ram usage memory (eg with my 160MB ones I'm up to 4.5GB) Are you using 64 bit python?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM