[英]Fastest way to create and fill huge numpy 2D-array?
我必須創建並填充巨大的( 例如 96 Go,72000行* 72000列)數組,每個數組中都有來自數學公式的浮點數。 該數組將在之后計算。
import itertools, operator, time, copy, os, sys
import numpy
from multiprocessing import Pool
def f2(x): # more complex mathematical formulas that change according to values in *i* and *x*
temp=[]
for i in combine:
temp.append(0.2*x[1]*i[1]/64.23)
return temp
def combinations_with_replacement_counts(n, r): #provide all combinations of r balls in n boxes
size = n + r - 1
for indices in itertools.combinations(range(size), n-1):
starts = [0] + [index+1 for index in indices]
stops = indices + (size,)
yield tuple(map(operator.sub, stops, starts))
global combine
combine = list(combinations_with_replacement_counts(3, 60)) #here putted 60 but need 350 instead
print len(combine)
if __name__ == '__main__':
t1=time.time()
pool = Pool() # start worker processes
results = [pool.apply_async(f2, (x,)) for x in combine]
roots = [r.get() for r in results]
print roots [0:3]
pool.close()
pool.join()
print time.time()-t1
我知道你可以創建可以從不同線程更改的共享numpy數組(假設更改的區域不重疊)。 下面是您可以使用的代碼草圖(我在stackoverflow上看到了原始的想法,編輯:這里是https://stackoverflow.com/a/5550156/1269140 )
import multiprocessing as mp ,numpy as np, ctypes
def shared_zeros(n1, n2):
# create a 2D numpy array which can be then changed in different threads
shared_array_base = mp.Array(ctypes.c_double, n1 * n2)
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(n1, n2)
return shared_array
class singleton:
arr = None
def dosomething(i):
# do something with singleton.arr
singleton.arr[i,:] = i
return i
def main():
singleton.arr=shared_zeros(1000,1000)
pool = mp.Pool(16)
pool.map(dosomething, range(1000))
if __name__=='__main__':
main()
您可以使用所需的形狀創建一個空的numpy.memmap
數組,然后使用multiprocessing.Pool
填充其值。 正確地執行此操作還會使池中每個進程的內存占用量相對較小。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.