在运行 Numpy 数组操作时减少 Memory 的使用

Question

I have a fairly large NumPy array that I need to perform an operation on but when I do so, my ~2GB array requires ~30GB of RAM in order to perform the operation.我有一个相当大的NumPy数组，我需要对其执行操作，但是当我这样做时，我的 ~2GB 数组需要 ~30GB 的 RAM 才能执行操作。 I've read that NumPy can be fairly clumsy with memory usage but this seems excessive.我读过 NumPy 使用NumPy可能相当笨拙，但这似乎过分了。

Does anyone know of an alternative way to apply these operations to limit the RAM load?有谁知道应用这些操作来限制 RAM 负载的替代方法？ Perhaps row-by-row/in place etc.?也许逐行/就地等？

Code below (ignore the meaningless calculation, in my code the coefficients vary):下面的代码（忽略无意义的计算，在我的代码中系数不同）：

import xarray as xr 
import numpy as np

def optimise(data):

    data_scaled_offset = (((data - 1000) * (1 / 1)) + 1).round(0)
    return data_scaled_offset.astype(np.uint16)

# This could also be float32 but I'm using uint16 here to reduce memory load for demo purposes
ds = np.random.randint(0, 12000, size=(40000,30000), dtype=np.uint16)

ds = optimise(ds) # Results in ~30GB RAM usage

Answer 1

By default operations like multiplication, addition and many others... you can use numpy.multiply, numpy.add and use out parameter to use existing array for storing result.默认情况下，如乘法、加法和许多其他操作......您可以使用 numpy.multiply、numpy.add 并使用 out 参数来使用现有数组来存储结果。 That will significantly reduce the memory usage.这将显着减少 memory 的使用。 Please see the demo below and translate you code to use those functions instead请查看下面的演示并翻译您的代码以使用这些功能

arr = np.random.rand(100)
arr2 = np.random.rand(100)

arr3 = np.subtract(arr, 100, out=arr)
arr4 = arr+100
arr5 = np.add(arr, arr2, out=arr2)
arr6 = arr+arr2

print(arr is arr3) # True
print(arr is arr4) # False
print(arr2 is arr5) # True
print(arr2 is arr6) # False

Answer 2

You could use eg.你可以使用例如。 Numba or Cython to reduce memory usage. Numba 或 Cython 以减少 memory 的使用。 Of course a simple Python loop would also be possible, but very slow.当然，一个简单的 Python 循环也是可能的，但速度很慢。

With allocated output array已分配 output 阵列

import numpy as np
import numba as nb

@nb.njit()
def optimise(data):
    data_scaled_offset=np.empty_like(data)
    # Inversely apply scale and scale and offset for this product
    for i in range(data.shape[0]):
        for j in range(data.shape[1]):
            data_scaled_offset[i,j] = np.round_((((data[i,j] - 1000) *(1 / 1)) + 1),0)

    return data_scaled_offset

In-Place到位

@nb.njit()
def optimise_in_place(data):
    # Inversely apply scale and scale and offset for this product
    for i in range(data.shape[0]):
        for j in range(data.shape[1]):
            data[i,j] = np.round_((((data[i,j] - 1000) *(1 / 1)) + 1),0)

    return data

在运行 Numpy 数组操作时减少 Memory 的使用

问题描述

2 个解决方案

解决方案1
2 2019-09-24 11:31:02

解决方案2
2 2019-09-24 12:43:47

在运行 Numpy 数组操作时减少 Memory 的使用

问题描述

2 个解决方案

解决方案1 2 2019-09-24 11:31:02

解决方案2 2 2019-09-24 12:43:47

解决方案1
2 2019-09-24 11:31:02

解决方案2
2 2019-09-24 12:43:47