简体   繁体   English

处理非常大的Numpy数组

[英]Handling extremely large Numpy arrays

I want to create a Numpy kernel matrix of dimensions 25000*25000. 我想创建一个尺寸为25000 * 25000的Numpy内核矩阵。 I want to know what is the most efficient way to handle such large matrix in terms of saving it on disk and loading it. 我想知道在将大型矩阵保存到磁盘上并进行加载方面最有效的方法是什么。 I tried dumping it with Pickle, but it threw an error saying it cannot serialize objects of size greater than 4 Gib. 我尝试用Pickle倾倒它,但是抛出错误,说它不能序列化大于4 Gib的对象。

u could try to save it in h5 file by pandas.HDFStore() 您可以尝试通过pandas.HDFStore()将其保存在h5文件中

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(25000,25000).astype('float16'))
memory_use = round(df.memory_usage(deep=True).sum()/1024*3,2)
print('use{}G'.format(memory_use))
store = pd.HDFStore('test.h5', 'w)
store['data'] = df
store.close()

Why not try to save the array as a file instead of using pickle 为什么不尝试将数组另存为文件而不是使用pickle

np.savetxt("filename",array)

It then can be read by 然后可以读取

np.genfromtxt("filename") np.genfromtxt(“文件名”)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM