简体繁体中英

memory efficient way to change and parse a large XML file in python

原文 2015-04-24 17:33:18 6 1 python/ xml/ parsing/ sax/ elementtree

I want to parse a large XML file (25 GB) in python, and change some of its elements.

I tried ElementTree from xml.etree but it takes too much time at the first step (ElementTree.parse).

I read somewhere that SAX is fast and do not load the entire file into the memory but it just for parsing not modifying.

'iterparse' should also be just for parsing not modifying.

Is there any other option which is fast and memory efficient?

1 answers

What is important for you here is that you need a streaming parser, which is what sax is. (There is a built in sax implementation in python and lxml provides one.) The problem is that since you are trying to modify the xml file, you will have to rewrite the xml file as you read it.

An XML file is a text file, You can't go and change some data in the middle of the text file without rewriting the entire text file (unless the data is the exact same size which is unlikely)

You can use SAX to read in each element and register an event to write back each element after it is been read and modified. If your changes are really simple it may be even faster to not even bother with the XML parsing and just match text for what you are looking for.

If you are doing any signinficant work with this large of an XML file, then I would say you shouldn't be using an XML file, you should be using a database.

The problem you have run into here is the same issue that Cobol programmers on mainframes had when they were working with File based data

Efficient way to parse large journalctl file to match keywords using Python

More efficient way to search large XML file in Python

Trying to parse large xml file in Python - Memory Errors

Most efficient way to parse a large .csv in python?

Less memory intensive way to parse large JSON file in Python

Parse large XML file in Python

Parse large XML files and extract nested elements in Python in efficient and fast way

Is there a more efficient way to parse XML to a database with python?

Efficient way to import large data file, Python

Efficient way of reading large txt file in python

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Efficient way to parse large journalctl file to match keywords using Python More efficient way to search large XML file in Python Trying to parse large xml file in Python - Memory Errors Most efficient way to parse a large .csv in python? Less memory intensive way to parse large JSON file in Python Parse large XML file in Python Parse large XML files and extract nested elements in Python in efficient and fast way Is there a more efficient way to parse XML to a database with python? Efficient way to import large data file, Python Efficient way of reading large txt file in python

Related Tags

memory efficient way to change and parse a large XML file in python

Question

1 answers

solution1 2 ACCPTED 2015-04-24 18:55:05

solution1
2 ACCPTED 2015-04-24 18:55:05