简体   繁体   English

使用python lxml解析部分XML

[英]Parsing a partial XML with python lxml

I'm trying to parse a large XML file which is being received from the network in Python. 我正在尝试解析一个用Python从网络接收的大型XML文件。

In order to do that, I get the data and pass it to lxml.etree.iterparse 为此,我获取数据并将其传递给lxml.etree.iterparse

However, if the XML has yet to fully be sent, like so: 但是,如果尚未完全发送XML,请执行以下操作:

<MyXML>
    <MyNode foo="bar">
    <MyNode foo="ba

If I run etree.iterparse(f, tag='MyNode').next() I get an XMLSyntaxError at whereever it was cut off. 如果我运行etree.iterparse(f, tag='MyNode').next()我得到一个XMLSyntaxError ,它被切断了。

Is there any way I can make it so I can receive the first tag (ie the first MyNode) and only get an exception when I reach that part of the document? 我有什么方法可以做到这一点,所以我可以收到第一个标签(即第一个MyNode),只有当我到达文档的那一部分时才会出现异常? (To make lxml really 'stream' the contents and not read the whole thing in the beginning). (使lxml真正'流'内容,而不是在开头阅读整个内容)。

XMLPullParser and HTMLPullParser may better suite your needs. XMLPullParserHTMLPullParser可以更好地满足您的需求。 They get their data by repeated calls to parser.feed(data) . 他们通过重复调用parser.feed(data)parser.feed(data) You still have to wait until all of the data comes in before the tree is usable. 在树可用之前,您仍需要等到所有数据都进入。

Try to learn from the answers of two related questions to your problem. 尝试从问题的两个相关问题的答案中学习。 Find more wisdom in more related answers. 在更多相关答案中找到更多智慧。 Your problem is very common, may be you need to tweak it a bit to fit into a proven solution. 您的问题很常见,可能需要稍微调整一下以适应经过验证的解决方案。 Prefer that way to create a stable solution. 更喜欢这种方式来创建稳定的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM