[英]Using python lxml.etree for huge XML files
I would like to parse a huge xml (>200MB) using lxml.etree
in Python. 我想在Python中使用lxml.etree
解析一个巨大的xml(> 200MB)。 I tried to use etree.parse
to load the XML file, but this does not work due to the filesize: 我尝试使用etree.parse
加载XML文件,但由于文件大小,这不起作用:
etree.parse('file.xml')Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lxml.etree.pyx", line 2706, in lxml.etree.parse (src/lxml/lxml.etree.c:49958)
File "parser.pxi", line 1500, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71797)
File "parser.pxi", line 1529, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:72080)
File "parser.pxi", line 1429, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:71175)
File "parser.pxi", line 975, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:68173)
File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:64257)
File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:65178)
File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64521)
lxml.etree.XMLSyntaxError: Excessive depth in document: 256 use XML_PARSE_HUGE option, line 1276, column 7
As I want to use xpath expressions, I have to parse the file first. 因为我想使用xpath表达式,我必须先解析文件。 How can I therefore parse the XML file? 我如何解析XML文件? How do I use XML_PARSE_HUGE
in connection to lxml.etree
? 如何使用XML_PARSE_HUGE
在连接lxml.etree
?
Thanks! 谢谢!
Try to create a custom XMLParser
instance: 尝试创建自定义XMLParser
实例:
from lxml.etree import XMLParser, parse
p = XMLParser(huge_tree=True)
tree = parse('file.xml', parser=p)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.