简体   繁体   中英

Parsing a partial XML with python lxml

I'm trying to parse a large XML file which is being received from the network in Python.

In order to do that, I get the data and pass it to lxml.etree.iterparse

However, if the XML has yet to fully be sent, like so:

<MyXML>
    <MyNode foo="bar">
    <MyNode foo="ba

If I run etree.iterparse(f, tag='MyNode').next() I get an XMLSyntaxError at whereever it was cut off.

Is there any way I can make it so I can receive the first tag (ie the first MyNode) and only get an exception when I reach that part of the document? (To make lxml really 'stream' the contents and not read the whole thing in the beginning).

XMLPullParser and HTMLPullParser may better suite your needs. They get their data by repeated calls to parser.feed(data) . You still have to wait until all of the data comes in before the tree is usable.

Try to learn from the answers of two related questions to your problem. Find more wisdom in more related answers. Your problem is very common, may be you need to tweak it a bit to fit into a proven solution. Prefer that way to create a stable solution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM