python lxml iterparse在包含名称空间的大文件上失败

Question

I'm tryint to parse large file (>100mb) as described at http://effbot.org/zone/element-iterparse.htm#incremental-parsing 我正在尝试解析大文件（> 100mb），如http://effbot.org/zone/element-iterparse.htm#incremental-parsing中所述

But if file contains namespaces, lxml fails with error 但是，如果文件包含名称空间，则lxml会失败并显示错误

lxml.etree.XMLSyntaxError: Namespace default prefix was not found

It works fine if I remove elem.clear(), but uses a lot of memory. 如果删除elem.clear（），它可以正常工作，但是会占用大量内存。 Example of xml file xml文件示例

<?xml version="1.0" encoding="utf-8" ?>
<feed xmlns="NS">
  <offer>
    <type>type1</type>
    <name>name1</name>
  </offer>
</feed>

lxml version is 3.2.0, because new versions segfaults after end of parsing lxml版本是3.2.0，因为新版本的段错误在解析结束后出现

Answer 1

Did you read this ? 你读过这个吗？ In my experience with 100MB+ files you are in over 2GB ram usage memory (eg with my 160MB ones I'm up to 4.5GB) Are you using 64 bit python? 根据我使用100MB以上文件的经验，您拥有超过2GB的ram使用内存（例如，使用160MB的内存，我的内存高达4.5GB）您是否使用64位python？

python lxml iterparse在包含名称空间的大文件上失败

问题描述

1 个解决方案

解决方案1
0 2014-03-25 15:25:31

python lxml iterparse在包含名称空间的大文件上失败

问题描述

1 个解决方案

解决方案1 0 2014-03-25 15:25:31

解决方案1
0 2014-03-25 15:25:31