[英]What's the fastest way to save/load a large list in Python 2.7?
What's the fastest way to save/load a large list in Python 2.7? 在Python 2.7中保存/加载大型列表的最快方法是什么? I apologize if this has already been asked, I couldn't find an answer to this exact question when I searched...
如果已经被问过我道歉,当我搜索时,我无法找到这个问题的答案...
More specifically, I'm testing out methods for simulating something, and I need to compare the result from each method I test out to an exact solution. 更具体地说,我正在测试模拟某些东西的方法,我需要将我测试的每个方法的结果与精确的解决方案进行比较。 I have a Python script that produces a list of values representing the exact solution, and I don't want to re-compute it every time I run a new simulation.
我有一个Python脚本,它生成一个表示确切解决方案的值列表,我不希望每次运行新模拟时都重新计算它。 Thus, I want to save it somewhere and just load the solution instead of re-computing it every time I want to see how good my simulation results are.
因此,我想将它保存在某个地方,只需加载解决方案,而不是每当我想看看我的模拟结果有多好时重新计算它。
I also don't need the saved file to be human-readable. 我也不需要保存的文件是人类可读的。 I just need to be able to load it in Python.
我只需要能够在Python中加载它。
Using np.load and tolist is significantly faster than any other solution: 使用np.load和tolist比任何其他解决方案快得多:
In [77]: outfile = open("test.pkl","w")
In [78]: l = list(range(1000000))
In [79]: timeit np.save("test",l)
10 loops, best of 3: 122 ms per loop
In [80]: timeit np.load("test.npy").tolist()
10 loops, best of 3: 20.9 ms per loop
In [81]: timeit pickle.load(outfile)
1 loops, best of 3: 1.86 s per loop
In [82]: outfile = open("test.pkl","r")
In [83]: timeit pickle.load(outfile)
1 loops, best of 3: 1.88 s per loop
In [84]: cPickle.dump(l,outfile)
....:
1 loops, best of 3:
273 ms per loop
In [85]: outfile = open("test.pkl","r")
In [72]: %%timeit
cPickle.load(outfile)
....:
1 loops, best of 3:
539 ms per loop
In python 3 numpy is far more efficient if you use a numpy array: 在python 3中,如果使用numpy数组,numpy会更有效:
In [24]: %%timeit
out = open("test.pkl","wb")
pickle.dump(l, out)
....:
10 loops, best of 3: 27.3 ms per loop
In [25]: %%timeit
out = open("test.pkl","rb")
pickle.load(out)
....:
10 loops, best of 3: 52.2 ms per loop
In [26]: timeit np.save("test",l)
10 loops, best of 3: 115 ms per loop
In [27]: timeit np.load("test.npy")
100 loops, best of 3: 2.35 ms per loop
If you want a list it is again faster to call tolist and use np.load: 如果你想要一个列表,那么再次调用tolist并使用np.load会更快:
In [29]: timeit np.load("test.npy").tolist()
10 loops, best of 3: 37 ms per loop
As PadraicCunningham has mentioned, you can pickle the list. 正如PadraicCunningham所提到的,你可以腌制清单。
import pickle
lst = [1,2,3,4,5]
with open('file.pkl', 'wb') as pickle_file:
pickle.dump(lst, pickle_file, protocol=pickle.HIGHEST_PROTOCOL)
this loads the list into a file. 这会将列表加载到文件中。
And to extract it: 并提取它:
import pickle
with open('file.pkl', 'rb') as pickle_load:
lst = pickle.load(pickle_load)
print(lst) # prints [1,2,3,4,5]
The HIGHEST_PROTOCOL
bit is optional, but is normally recommended. HIGHEST_PROTOCOL
位是可选的,但通常建议使用。 Protocols define how pickle will serialise the object, with lower protocols tending to be compatible with older versions of Python. 协议定义了pickle如何序列化对象,较低的协议倾向于与旧版本的Python兼容。
It's worth noting two more things: 值得注意的还有两件事:
There is also the cPickle
module - written in C to optimise speed. 还有
cPickle
模块 - 用C语言编写以优化速度。 You use this in the same way as above. 您可以按照与上面相同的方式使用它。
Pickle is also known to have some insecurities (there are ways of manipulating how pickle deserialises an object, which you can manipulate into making Python do more or less whatever you want). Pickle也有一些不安全感(有一些方法可以操纵pickle如何反序列化一个对象,你可以操纵它来使Python做任何你想做的事情或多或少)。 As a result, this library shouldn't be used when it will be opening unknown data.
因此,在打开未知数据时不应使用此库。 In extreme cases you can try out a safer version like
spickle
: https://github.com/ershov/sPickle 在极端情况下,您可以尝试更安全的版本,如
spickle
: https : //github.com/ershov/sPickle
Other libraries I'd recommend looking up are json
and marshall
. 我推荐查找的其他图书馆是
json
和marshall
。
I've done some profiling of many methods (except the numpy method) and pickle/cPickle is very slow on simple data sets. 我已经对许多方法进行了一些分析(除了numpy方法)并且pickle / cPickle在简单数据集上非常慢。 The fastest way depends on what type of data you are saving.
最快的方法取决于您要保存的数据类型。 If you are saving a list of strings and/or integers.
如果要保存字符串和/或整数列表。 The fastest way that I've seen is to just write it directly to a file using a for loop and
','.join(...)
; 我见过的最快的方法是使用for循环直接将它写入文件,并使用
','.join(...)
; read it back in using a similar for loop with .split(',')
. 使用与
.split(',')
类似的for循环读回来。
You may want to take a look at Python object serialization, pickle
and cPickle
http://pymotw.com/2/pickle/ 你可能想看看Python对象序列化,
pickle
和cPickle
http://pymotw.com/2/pickle/
pickle.dumps(obj[, protocol])
If the protocol parameter is omitted, protocol 0 is used. pickle.dumps(obj[, protocol])
如果省略protocol参数,则使用协议0。 If protocol is specified as a negative value or HIGHEST_PROTOCOL, the highest protocol version will be used. 如果protocol指定为负值或HIGHEST_PROTOCOL,则将使用最高协议版本。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.