[英]Fastest way to create and fill huge numpy 2D-array?
I have to create and fill huge ( eg 96 Go, 72000 rows * 72000 columns) array with floats in each case that come from mathematical formulas. 我必须创建并填充巨大的( 例如 96 Go,72000行* 72000列)数组,每个数组中都有来自数学公式的浮点数。 The array will be computed after.
该数组将在之后计算。
import itertools, operator, time, copy, os, sys
import numpy
from multiprocessing import Pool
def f2(x): # more complex mathematical formulas that change according to values in *i* and *x*
temp=[]
for i in combine:
temp.append(0.2*x[1]*i[1]/64.23)
return temp
def combinations_with_replacement_counts(n, r): #provide all combinations of r balls in n boxes
size = n + r - 1
for indices in itertools.combinations(range(size), n-1):
starts = [0] + [index+1 for index in indices]
stops = indices + (size,)
yield tuple(map(operator.sub, stops, starts))
global combine
combine = list(combinations_with_replacement_counts(3, 60)) #here putted 60 but need 350 instead
print len(combine)
if __name__ == '__main__':
t1=time.time()
pool = Pool() # start worker processes
results = [pool.apply_async(f2, (x,)) for x in combine]
roots = [r.get() for r in results]
print roots [0:3]
pool.close()
pool.join()
print time.time()-t1
I know that you can create shared numpy arrays that can be changed from different threads (assuming that the changed areas don't overlap). 我知道你可以创建可以从不同线程更改的共享numpy数组(假设更改的区域不重叠)。 Here is the sketch of the code that you can use to do that (I saw the original idea somewhere on stackoverflow, edit: here it is https://stackoverflow.com/a/5550156/1269140 )
下面是您可以使用的代码草图(我在stackoverflow上看到了原始的想法,编辑:这里是https://stackoverflow.com/a/5550156/1269140 )
import multiprocessing as mp ,numpy as np, ctypes
def shared_zeros(n1, n2):
# create a 2D numpy array which can be then changed in different threads
shared_array_base = mp.Array(ctypes.c_double, n1 * n2)
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(n1, n2)
return shared_array
class singleton:
arr = None
def dosomething(i):
# do something with singleton.arr
singleton.arr[i,:] = i
return i
def main():
singleton.arr=shared_zeros(1000,1000)
pool = mp.Pool(16)
pool.map(dosomething, range(1000))
if __name__=='__main__':
main()
You can create an empty numpy.memmap
array with the desired shape, and then use multiprocessing.Pool
to populate its values. 您可以使用所需的形状创建一个空的
numpy.memmap
数组,然后使用multiprocessing.Pool
填充其值。 Doing it correctly would also keep memory footprint of each process in your pool relatively small. 正确地执行此操作还会使池中每个进程的内存占用量相对较小。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.