在ASCII文件中存储numpy数组的最佳方法

Question

I often have processed numpy arrays that come as a result of lengthy computations. 由于冗长的计算，我经常处理numpy数组。 I need to use them elsewhere in calculations. 我需要在计算的其他地方使用它们。 I currently 'pickle' them and unpickle the files into variables as and when I need them. 目前，我可以“修补”它们，并在需要时将其解钉为变量。

I noticed for large data sizes (~1M data points), this is slow. 我注意到对于大数据量（〜1M数据点），这很慢。 I read elsewhere that pickling is not best way to store huge files. 我在其他地方读到，酸洗不是存储大文件的最佳方法。 I would like to store and read them as ASCII files efficiently to load directly into a numpy array. 我想有效地存储和读取它们作为ASCII文件，以直接加载到numpy数组中。 What is the best way to do this? 做这个的最好方式是什么？

say I have a 100k x 3 2D array in a variable 'a'. 说我在变量“ a”中有一个100k x 3 2D数组。 I want to store it in an ASCII file and load it into a numpy array variable 'b'. 我想将其存储在ASCII文件中并将其加载到numpy数组变量'b'中。

Answer 1

If you want efficiency, ASCII will not be the case. 如果要提高效率，则不是ASCII。 The problem with pickle is that it is dependent on the python version, so it's not a good idea for long term storage. pickle的问题在于它依赖于python版本，因此长期存储不是一个好主意。 You can try to use other binary technologies, where the most straightforward solution would be to use the numpy.save method as documented here . 您可以尝试使用其他二进制技术，其中最直接的解决方案是使用此处记录的numpy.save方法。

Answer 2

Numpy has a range of input and output methods that will do exactly what you are after. Numpy具有一系列输入和输出方法，可以完全满足您的需求。

One option would be numpy.save : 一种选择是numpy.save ：

import numpy as np

my_array = np.array([1,2,3,4])
with open('data.txt', 'wb') as f:
    np.save(f, my_array, allow_pickle=False)

To load your data again: 要再次加载数据：

with open('data.txt', 'rb') as f:
    my_loaded_array = np.load(f)

Answer 3

The problem you pose is directly related to the size of the dataset. 您提出的问题与数据集的大小直接相关。

There are several solutions to this quite common problem that come with specialized libraries. 专用库提供了一些解决此常见问题的解决方案。

Python-only persistence: joblib offers an alternative to pickle specifically for storing files that are too large for convenient pickling. 仅限Python的持久性：joblib提供了替代pickle的方法，专门用于存储太大的文件而无法方便地进行pickle。
HDF5 is a file format that is specifically targeted for storing arrays. HDF5是一种专门用于存储阵列的文件格式。 The format is multi-language and multi-platform but a very good Python library exists for it: h5py 格式是多语言和多平台的，但是有一个非常好的Python库： h5py

An example with h5py. 以h5py为例。 To write the data: 写入数据：

import h5py
with h5py.File('data.h5', 'w') as f:
    f.create_dataset('a', data=a)

To read the data: 读取数据：

import h5py
with h5py.File('data.h5', 'r') as f:
    b = f['a'][:]

在ASCII文件中存储numpy数组的最佳方法

问题描述

3 个解决方案

解决方案1
3 2017-09-28 09:32:10

解决方案2
3 已采纳 2017-09-28 09:32:48

解决方案3
2 2017-09-28 09:38:01

在ASCII文件中存储numpy数组的最佳方法

问题描述

3 个解决方案

解决方案1 3 2017-09-28 09:32:10

解决方案2 3 已采纳 2017-09-28 09:32:48

解决方案3 2 2017-09-28 09:38:01

解决方案1
3 2017-09-28 09:32:10

解决方案2
3 已采纳 2017-09-28 09:32:48

解决方案3
2 2017-09-28 09:38:01