[英]Memory Error when using float32 in dask array
I am trying to import a 1.25 GB dataset into python using dask.array
我正在尝试使用dask.array
将1.25 GB数据集导入python
The file is a 1312*2500*196 Array of uint16
's. 该文件是一个1312 * 2500 * 196的uint16
数组。 I need to convert this to a float32
array for later processing. 我需要将其转换为float32
数组以供以后处理。
I have managed to stitch together this Dask array in uint16
, however when I try to convert to float32
I get a memory error . 我已经设法在uint16
拼接这个Dask数组,但是当我尝试转换为float32
我得到了一个内存错误 。
It doesn't matter what I do to the chunk size, I will always get a memory error. 无论我对块大小做什么都没关系,我总会遇到内存错误。
I create the array by concatenating the array in lines of 100 (breaking the 2500 dimension up into little pieces of 100 lines, since dask
can't natively read .RAW
imaging files I have to use numpy.memmap()
to read the file and then create the array. Below I will supply a "as short as possible" code snippet: 我通过将数组连接成100行来创建数组(将2500维度分解为100行的小块,因为dask
无法原生读取.RAW
映像文件我必须使用numpy.memmap()
来读取文件和然后创建数组。下面我将提供一个“尽可能短”的代码片段:
I have tried two methods: 我尝试了两种方法:
1) Create the full uint16
array and then try to convert to float32
: 1)创建完整的uint16
数组,然后尝试转换为float32
:
(note: the memmap
is a 1312x100x196 array and lines ranges from 0 to 24) (注意: memmap
是一个1312x100x196阵列,行数从0到24)
for i in range(lines):
NewArray = da.concatenate([OldArray,Memmap],axis=0)
OldArray = NewArray
return NewArray
and then I use 然后我用
Float32Array = FinalArray.map_blocks(lambda FinalArray: FinalArray * 1.,dtype=np.float32)
In method 2: 在方法2中:
for i in range(lines):
NewArray = da.concatenate([OldArray,np.float32(Memmap)],axis=0)
OldArray = NewArray
return NewArray
Both methods result in a memory error. 这两种方法都会导致内存错误。
Is there any reason for this? 这有什么理由吗?
I read that dask
array is capable of doing up to 100 GB dataset calculations. 我读到dask
数组能够进行高达100 GB的数据集计算。
I tried all chunk sizes (from as small as 10x10x10 to a single line) 我尝试了所有块大小(从10x10x10到单行)
You can create a dask.array from a numpy memmap array directly with the da.from_array
function 您可以使用da.from_array
函数直接从numpy memmap数组创建da.from_array
x = load_memmap_numpy_array_from_raw_file(filename)
d = da.from_array(x, chunks=...)
You can change the dtype with the astype
method 您可以使用astype
方法更改astype
d = d.astype(np.float32)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.