简体   繁体   中英

What's the fastest way to save/load a large list in Python 2.7?

What's the fastest way to save/load a large list in Python 2.7? I apologize if this has already been asked, I couldn't find an answer to this exact question when I searched...

More specifically, I'm testing out methods for simulating something, and I need to compare the result from each method I test out to an exact solution. I have a Python script that produces a list of values representing the exact solution, and I don't want to re-compute it every time I run a new simulation. Thus, I want to save it somewhere and just load the solution instead of re-computing it every time I want to see how good my simulation results are.

I also don't need the saved file to be human-readable. I just need to be able to load it in Python.

Using np.load and tolist is significantly faster than any other solution:

In [77]: outfile = open("test.pkl","w")   
In [78]: l = list(range(1000000))   

In [79]:  timeit np.save("test",l)
10 loops, best of 3: 122 ms per loop

In [80]:  timeit np.load("test.npy").tolist()
10 loops, best of 3: 20.9 ms per loop

In [81]: timeit pickle.load(outfile)
1 loops, best of 3: 1.86 s per loop

In [82]: outfile = open("test.pkl","r")

In [83]: timeit pickle.load(outfile)
1 loops, best of 3: 1.88 s per loop

In [84]: cPickle.dump(l,outfile)
....: 
1 loops, best of 3: 
273 ms per loop    
In [85]: outfile = open("test.pkl","r")
In [72]: %%timeit
cPickle.load(outfile)
   ....: 
1 loops, best of 3: 
539 ms per loop

In python 3 numpy is far more efficient if you use a numpy array:

In [24]: %%timeit                  
out = open("test.pkl","wb")
pickle.dump(l, out)
   ....: 
10 loops, best of 3: 27.3 ms per loop

In [25]: %%timeit
out = open("test.pkl","rb")
pickle.load(out)
   ....: 
10 loops, best of 3: 52.2 ms per loop

In [26]: timeit np.save("test",l)
10 loops, best of 3: 115 ms per loop

In [27]: timeit np.load("test.npy")
100 loops, best of 3: 2.35 ms per loop

If you want a list it is again faster to call tolist and use np.load:

In [29]: timeit np.load("test.npy").tolist()
10 loops, best of 3: 37 ms per loop

As PadraicCunningham has mentioned, you can pickle the list.

import pickle

lst = [1,2,3,4,5]

with open('file.pkl', 'wb') as pickle_file:
    pickle.dump(lst, pickle_file, protocol=pickle.HIGHEST_PROTOCOL)

this loads the list into a file.

And to extract it:

import pickle

with open('file.pkl', 'rb') as pickle_load:
    lst = pickle.load(pickle_load)
print(lst) # prints [1,2,3,4,5]

The HIGHEST_PROTOCOL bit is optional, but is normally recommended. Protocols define how pickle will serialise the object, with lower protocols tending to be compatible with older versions of Python.

It's worth noting two more things:

There is also the cPickle module - written in C to optimise speed. You use this in the same way as above.

Pickle is also known to have some insecurities (there are ways of manipulating how pickle deserialises an object, which you can manipulate into making Python do more or less whatever you want). As a result, this library shouldn't be used when it will be opening unknown data. In extreme cases you can try out a safer version like spickle : https://github.com/ershov/sPickle

Other libraries I'd recommend looking up are json and marshall .

I've done some profiling of many methods (except the numpy method) and pickle/cPickle is very slow on simple data sets. The fastest way depends on what type of data you are saving. If you are saving a list of strings and/or integers. The fastest way that I've seen is to just write it directly to a file using a for loop and ','.join(...) ; read it back in using a similar for loop with .split(',') .

You may want to take a look at Python object serialization, pickle and cPickle http://pymotw.com/2/pickle/

pickle.dumps(obj[, protocol]) If the protocol parameter is omitted, protocol 0 is used. If protocol is specified as a negative value or HIGHEST_PROTOCOL, the highest protocol version will be used.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM