[英]best way to store numpy arrays in ascii files
I often have processed numpy arrays that come as a result of lengthy computations. 由于冗长的计算,我经常处理numpy数组。 I need to use them elsewhere in calculations.
我需要在计算的其他地方使用它们。 I currently 'pickle' them and unpickle the files into variables as and when I need them.
目前,我可以“修补”它们,并在需要时将其解钉为变量。
I noticed for large data sizes (~1M data points), this is slow. 我注意到对于大数据量(〜1M数据点),这很慢。 I read elsewhere that pickling is not best way to store huge files.
我在其他地方读到,酸洗不是存储大文件的最佳方法。 I would like to store and read them as ASCII files efficiently to load directly into a numpy array.
我想有效地存储和读取它们作为ASCII文件,以直接加载到numpy数组中。 What is the best way to do this?
做这个的最好方式是什么?
say I have a 100k x 3 2D array in a variable 'a'. 说我在变量“ a”中有一个100k x 3 2D数组。 I want to store it in an ASCII file and load it into a numpy array variable 'b'.
我想将其存储在ASCII文件中并将其加载到numpy数组变量'b'中。
If you want efficiency, ASCII will not be the case. 如果要提高效率,则不是ASCII。 The problem with pickle is that it is dependent on the python version, so it's not a good idea for long term storage.
pickle的问题在于它依赖于python版本,因此长期存储不是一个好主意。 You can try to use other binary technologies, where the most straightforward solution would be to use the
numpy.save
method as documented here . 您可以尝试使用其他二进制技术,其中最直接的解决方案是使用此处记录的
numpy.save
方法。
Numpy has a range of input and output methods that will do exactly what you are after. Numpy具有一系列输入和输出方法 ,可以完全满足您的需求。
One option would be numpy.save
: 一种选择是
numpy.save
:
import numpy as np
my_array = np.array([1,2,3,4])
with open('data.txt', 'wb') as f:
np.save(f, my_array, allow_pickle=False)
To load your data again: 要再次加载数据:
with open('data.txt', 'rb') as f:
my_loaded_array = np.load(f)
The problem you pose is directly related to the size of the dataset. 您提出的问题与数据集的大小直接相关。
There are several solutions to this quite common problem that come with specialized libraries. 专用库提供了一些解决此常见问题的解决方案。
An example with h5py. 以h5py为例。 To write the data:
写入数据:
import h5py
with h5py.File('data.h5', 'w') as f:
f.create_dataset('a', data=a)
To read the data: 读取数据:
import h5py
with h5py.File('data.h5', 'r') as f:
b = f['a'][:]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.