简体繁体中英

Python and Memory Consumption

原文 2010-03-02 02:11:40 0 1 python/ memory-management

I am searching for a way to be able to handle overloading the RAM and CPU using a high memory program... I would like to process a LARGE amount of data contained in files. I then read the files and process the data therein. The problem is there are many nested for loops and a root XML file is being created from all the data processed. The program easily consumes a couple gigs of RAM after half hour or so of run-time. Is there something I can do to not let RAM get so big and/or work around it.. ?

1 answers

Do you really need to keep the whole data from the XML file on memory at once?

Most (all?) XML libraries out there allow you to do iterative parsing , meaning that you keep in memory just a few nodes of the XML file , not the whole file . That is unless you are making a string containing the XML file yourself without any library, but that is a bit insane. If that is the case, use a library ASAP.

The specific code samples presented here might not apply to your project, but consider a few principles—borne out by testing and the lxml documentation—when faced with XML data measured in gigabytes or more:

Use an iterative parsing strategy to incrementally process large documents.

If searching the entire document in random order is required, move to an indexed XML database.

Be extremely conservative in the data that you select. If you are only interested in particular nodes, use methods that select by those names. If you require predicate syntax, try one of the XPath classes and methods available.

Consider the task at hand and the comfort level of the developer. Object models such as lxml 's objectify or Amara might be more natural for Python developers when speed is not a consideration. cElementTree is faster when only parsing is required.

Take the time to do even simple benchmarking. When processing millions of records, small differences add up, and it is not always obvious which methods are the most efficient.

If you need to do complex operations on the data, why don't you just put it on a relational database and operate on the data from there? That will have better performance.

Python unittest memory consumption

Memory consumption of python ESL

Python memory consumption of objects and process

Memory consumption of python lists of lists

Python MemoryError without memory consumption

python arrays, huge memory consumption

Low memory consumption watermarking for Python

Python Shelve Module Memory Consumption

Python: Spacy NER and memory consumption

How to accelerate python script and reduce memory consumption?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Python unittest memory consumption Memory consumption of python ESL Python memory consumption of objects and process Memory consumption of python lists of lists Python MemoryError without memory consumption python arrays, huge memory consumption Low memory consumption watermarking for Python Python Shelve Module Memory Consumption Python: Spacy NER and memory consumption How to accelerate python script and reduce memory consumption?

Related Tags

Python and Memory Consumption

Question

1 answers

solution1 3 2010-03-02 02:17:44

solution1
3 2010-03-02 02:17:44