在dask数组中使用float32时出现内存错误

Question

I am trying to import a 1.25 GB dataset into python using dask.array 我正在尝试使用dask.array将1.25 GB数据集导入python

The file is a 1312*2500*196 Array of uint16 's. 该文件是一个1312 * 2500 * 196的uint16数组。 I need to convert this to a float32 array for later processing. 我需要将其转换为float32数组以供以后处理。

I have managed to stitch together this Dask array in uint16 , however when I try to convert to float32 I get a memory error . 我已经设法在uint16拼接这个Dask数组，但是当我尝试转换为float32我得到了一个内存错误 。

It doesn't matter what I do to the chunk size, I will always get a memory error. 无论我对块大小做什么都没关系，我总会遇到内存错误。

I create the array by concatenating the array in lines of 100 (breaking the 2500 dimension up into little pieces of 100 lines, since dask can't natively read .RAW imaging files I have to use numpy.memmap() to read the file and then create the array. Below I will supply a "as short as possible" code snippet: 我通过将数组连接成100行来创建数组（将2500维度分解为100行的小块，因为dask无法原生读取.RAW映像文件我必须使用numpy.memmap()来读取文件和然后创建数组。下面我将提供一个“尽可能短”的代码片段：

I have tried two methods: 我尝试了两种方法：

1) Create the full uint16 array and then try to convert to float32 : 1）创建完整的uint16数组，然后尝试转换为float32 ：

(note: the memmap is a 1312x100x196 array and lines ranges from 0 to 24) （注意： memmap是一个1312x100x196阵列，行数从0到24）

for i in range(lines):
    NewArray = da.concatenate([OldArray,Memmap],axis=0)
    OldArray = NewArray
return NewArray

and then I use 然后我用

Float32Array = FinalArray.map_blocks(lambda FinalArray: FinalArray * 1.,dtype=np.float32)

In method 2: 在方法2中：

for i in range(lines):
    NewArray = da.concatenate([OldArray,np.float32(Memmap)],axis=0)
    OldArray = NewArray
return NewArray

Both methods result in a memory error. 这两种方法都会导致内存错误。

Is there any reason for this? 这有什么理由吗？

I read that dask array is capable of doing up to 100 GB dataset calculations. 我读到dask数组能够进行高达100 GB的数据集计算。

I tried all chunk sizes (from as small as 10x10x10 to a single line) 我尝试了所有块大小（从10x10x10到单行）

Answer 1

You can create a dask.array from a numpy memmap array directly with the da.from_array function 您可以使用da.from_array函数直接从numpy memmap数组创建da.from_array

x = load_memmap_numpy_array_from_raw_file(filename)
d = da.from_array(x, chunks=...)

You can change the dtype with the astype method 您可以使用astype方法更改astype

d = d.astype(np.float32)

在dask数组中使用float32时出现内存错误

问题描述

1 个解决方案

解决方案1
1 2015-10-05 05:07:00

在dask数组中使用float32时出现内存错误

问题描述

1 个解决方案

解决方案1 1 2015-10-05 05:07:00

解决方案1
1 2015-10-05 05:07:00