[英]Reduce Memory Usage when Running Numpy Array Operations
I have a fairly large NumPy
array that I need to perform an operation on but when I do so, my ~2GB array requires ~30GB of RAM in order to perform the operation.我有一个相当大的
NumPy
数组,我需要对其执行操作,但是当我这样做时,我的 ~2GB 数组需要 ~30GB 的 RAM 才能执行操作。 I've read that NumPy
can be fairly clumsy with memory usage but this seems excessive.我读过 NumPy 使用
NumPy
可能相当笨拙,但这似乎过分了。
Does anyone know of an alternative way to apply these operations to limit the RAM load?有谁知道应用这些操作来限制 RAM 负载的替代方法? Perhaps row-by-row/in place etc.?
也许逐行/就地等?
Code below (ignore the meaningless calculation, in my code the coefficients vary):下面的代码(忽略无意义的计算,在我的代码中系数不同):
import xarray as xr
import numpy as np
def optimise(data):
data_scaled_offset = (((data - 1000) * (1 / 1)) + 1).round(0)
return data_scaled_offset.astype(np.uint16)
# This could also be float32 but I'm using uint16 here to reduce memory load for demo purposes
ds = np.random.randint(0, 12000, size=(40000,30000), dtype=np.uint16)
ds = optimise(ds) # Results in ~30GB RAM usage
By default operations like multiplication, addition and many others... you can use numpy.multiply, numpy.add and use out parameter to use existing array for storing result.默认情况下,如乘法、加法和许多其他操作......您可以使用 numpy.multiply、numpy.add 并使用 out 参数来使用现有数组来存储结果。 That will significantly reduce the memory usage.
这将显着减少 memory 的使用。 Please see the demo below and translate you code to use those functions instead
请查看下面的演示并翻译您的代码以使用这些功能
arr = np.random.rand(100)
arr2 = np.random.rand(100)
arr3 = np.subtract(arr, 100, out=arr)
arr4 = arr+100
arr5 = np.add(arr, arr2, out=arr2)
arr6 = arr+arr2
print(arr is arr3) # True
print(arr is arr4) # False
print(arr2 is arr5) # True
print(arr2 is arr6) # False
You could use eg.你可以使用例如。 Numba or Cython to reduce memory usage.
Numba 或 Cython 以减少 memory 的使用。 Of course a simple Python loop would also be possible, but very slow.
当然,一个简单的 Python 循环也是可能的,但速度很慢。
With allocated output array已分配 output 阵列
import numpy as np
import numba as nb
@nb.njit()
def optimise(data):
data_scaled_offset=np.empty_like(data)
# Inversely apply scale and scale and offset for this product
for i in range(data.shape[0]):
for j in range(data.shape[1]):
data_scaled_offset[i,j] = np.round_((((data[i,j] - 1000) *(1 / 1)) + 1),0)
return data_scaled_offset
In-Place到位
@nb.njit()
def optimise_in_place(data):
# Inversely apply scale and scale and offset for this product
for i in range(data.shape[0]):
for j in range(data.shape[1]):
data[i,j] = np.round_((((data[i,j] - 1000) *(1 / 1)) + 1),0)
return data
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.