简体   繁体   English

在dask数组中使用float32时出现内存错误

[英]Memory Error when using float32 in dask array

I am trying to import a 1.25 GB dataset into python using dask.array 我正在尝试使用dask.array将1.25 GB数据集导入python

The file is a 1312*2500*196 Array of uint16 's. 该文件是一个1312 * 2500 * 196的uint16数组。 I need to convert this to a float32 array for later processing. 我需要将其转换为float32数组以供以后处理。

I have managed to stitch together this Dask array in uint16 , however when I try to convert to float32 I get a memory error . 我已经设法在uint16拼接这个Dask数组,但是当我尝试转换为float32我得到了一个内存错误

It doesn't matter what I do to the chunk size, I will always get a memory error. 无论我对块大小做什么都没关系,我总会遇到内存错误。

I create the array by concatenating the array in lines of 100 (breaking the 2500 dimension up into little pieces of 100 lines, since dask can't natively read .RAW imaging files I have to use numpy.memmap() to read the file and then create the array. Below I will supply a "as short as possible" code snippet: 我通过将数组连接成100行来创建数组(将2500维度分解为100行的小块,因为dask无法原生读取.RAW映像文件我必须使用numpy.memmap()来读取文件和然后创建数组。下面我将提供一个“尽可能短”的代码片段:

I have tried two methods: 我尝试了两种方法:

1) Create the full uint16 array and then try to convert to float32 : 1)创建完整的uint16数组,然后尝试转换为float32

(note: the memmap is a 1312x100x196 array and lines ranges from 0 to 24) (注意: memmap是一个1312x100x196阵列,行数从0到24)

for i in range(lines):
    NewArray = da.concatenate([OldArray,Memmap],axis=0)
    OldArray = NewArray
return NewArray

and then I use 然后我用

Float32Array = FinalArray.map_blocks(lambda FinalArray: FinalArray * 1.,dtype=np.float32)

In method 2: 在方法2中:

for i in range(lines):
    NewArray = da.concatenate([OldArray,np.float32(Memmap)],axis=0)
    OldArray = NewArray
return NewArray

Both methods result in a memory error. 这两种方法都会导致内存错误。

Is there any reason for this? 这有什么理由吗?

I read that dask array is capable of doing up to 100 GB dataset calculations. 我读到dask数组能够进行高达100 GB的数据集计算。

I tried all chunk sizes (from as small as 10x10x10 to a single line) 我尝试了所有块大小(从10x10x10到​​单行)

You can create a dask.array from a numpy memmap array directly with the da.from_array function 您可以使用da.from_array函数直接从numpy memmap数组创建da.from_array

x = load_memmap_numpy_array_from_raw_file(filename)
d = da.from_array(x, chunks=...)

You can change the dtype with the astype method 您可以使用astype方法更改astype

d = d.astype(np.float32)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在Python中使用astype('float32')时出错 - Error when using astype('float32') in Python 训练我的 model 时出现 Memory 错误:无法为形状为(3094、720、1280、3)和数据类型 float32 的数组分配 31.9 GiB - Memory error while training my model: Unable to allocate 31.9 GiB for an array with shape (3094, 720, 1280, 3) and data type float32 包含数组float32元素的字典 - Dictionary containing array float32 element 如何将array([array([…],type = float32)],[array([…],type = float32)])转换为array([…],[…])? - How to convert array([array([ …],type=float32)],[array([ …],type=float32)]) to array([…],[…])? 在python中使用Netcdf4从Netcdf文件中获取float32时,ValueError字符串浮动 - ValueError string to float when retrieving float32 from Netcdf file using Netcdf4 in python 在Dataframe float32列上使用list(zip(…))时出现浮动问题 - Float issue when using list(zip(…)) on Dataframe float32 columns float32的精度 - accuracy of float32 numpy-float32在数组中与dtype =“ float32”给出不同的值 - Numpy - float32 gives different value from dtype=“float32” in array astype float32对于整数的float64出错 - Error in astype float32 vs float64 for integer float64到float32 Cython错误 - float64 to float32 Cython Error
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM