简体繁体中英

python make huge file persist in memory

原文 2015-04-27 23:31:47 2 3 python/ pandas/ pickle

I have a python script that needs to read a huge file into a var and then search into it and perform other stuff, the problem is the web server calls this script multiple times and every time i am having a latency of around 8 seconds while the file loads. Is it possible to make the file persist in memory to have faster access to it atlater times ? I know i can make the script as a service using supervisor but i can't do that for this.

Any other suggestions please. PS I am already using var = pickle.load(open(file))

3 answers

You should take a look at http://docs.h5py.org/en/latest/ . It allows to perform various operations on huge files. It's what the NASA uses.

Not an easy problem. I assume you can do nothing about the fact that your web server calls your application multiple times. In that case I see two solutions:

(1) Write TWO separate applications. The first application, A, loads the large file and then it just sits there, waiting for the other application to access the data. "A" provides access as required, so it's basically a sort of custom server. The second application, B, is the one that gets called multiple times by the web server. On each call, it extracts the necessary data from A using some form of interprocess communication. This ought to be relatively fast. The Python standard library offers some tools for interprocess communication (socket, http server) but they are rather low-level. Alternatives are almost certainly going to be operating-system dependent.

(2) Perhaps you can pre-digest or pre-analyze the large file, writing out a more compact file that can be loaded quickly. A similar idea is suggested by tdelaney in his comment (some sort of database arrangement).

You are talking about memory-caching a large array, essentially…?

There are three fairly viable options for large arrays:

use memory-mapped arrays
use h5py or pytables as a back-end
use an array caching-aware package like klepto or joblib .

Memory-mapped arrays index the array in file, as if there were in memory. h5py or pytables give you fast access to arrays on disk, and also can avoid the load of the entire array into memory. klepto and joblib can store arrays as a collection of "database" entries (typically a directory tree of files on disk), so you can load portions of the array into memory easily. Each have a different use case, so the best choice for you depends on what you want to do. (I'm the klepto author, and it can use SQL database tables as a backend instead of files).

Memory Error - Python (Huge File Changes)

Python: How to read huge text file into memory

Huge cost of Memory in Python

Most efficient way to read a huge parquet file into memory in Python

Python: Removing duplicates from a huge csv file (memory issues)

python arrays, huge memory consumption

Reading Huge File in Python

Python - reading huge file

Python parsing a huge file

Sorting huge file for Python

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Memory Error - Python (Huge File Changes) Python: How to read huge text file into memory Huge cost of Memory in Python Most efficient way to read a huge parquet file into memory in Python Python: Removing duplicates from a huge csv file (memory issues) python arrays, huge memory consumption Reading Huge File in Python Python - reading huge file Python parsing a huge file Sorting huge file for Python

Related Tags

python make huge file persist in memory

Question

3 answers

solution1
0 2015-04-27 23:47:35

solution2
0 2015-04-28 00:30:50

solution3
0 2015-04-29 05:55:48

python make huge file persist in memory

Question

3 answers

solution1 0 2015-04-27 23:47:35

solution2 0 2015-04-28 00:30:50

solution3 0 2015-04-29 05:55:48

solution1
0 2015-04-27 23:47:35

solution2
0 2015-04-28 00:30:50

solution3
0 2015-04-29 05:55:48