计算复杂 numpy ndarray 的 abs()**2 的最节省内存的方法

Question

I'm looking for the most memory-efficient way to compute the absolute squared value of a complex numpy ndarray我正在寻找最节省内存的方法来计算复杂的 numpy ndarray 的绝对平方值

arr = np.empty((250000, 150), dtype='complex128')  # common size

I haven't found a ufunc that would do exactly np.abs()**2 .我还没有找到一个可以完全执行np.abs()**2 。

As an array of that size and type takes up around half a GB, I'm looking for a primarily memory-efficient way.由于这种大小和类型的数组占用大约半 GB，我正在寻找一种主要是内存高效的方式。

I would also like it to be portable, so ideally some combination of ufuncs.我也希望它是便携的，所以最好是一些 ufuncs 的组合。

So far my understanding is that this should be about the best到目前为止，我的理解是这应该是最好的

result = np.abs(arr)
result **= 2

It will needlessly compute (**0.5)**2 , but should compute **2 in-place.它将不必要地计算(**0.5)**2 ，但应该就地计算**2 。 Altogether the peak memory requirement is only the original array size + result array size, which should be 1.5 * original array size as the result is real.总而言之，峰值内存要求仅为原始数组大小 + 结果数组大小，由于结果是真实的，因此应为 1.5 * 原始数组大小。

If I wanted to get rid of the useless **2 call I'd have to do something like this如果我想摆脱无用的**2电话，我必须做这样的事情

result = arr.real**2
result += arr.imag**2

but if I'm not mistaken, this means I'll have to allocate memory for both the real and imaginary part calculation, so the peak memory usage would be 2.0 * original array size.但如果我没有记错，这意味着我将不得不为实部和虚部的计算都分配内存，所以内存使用峰值是2.0 *原始数组的大小。 The arr.real properties also return a non-contiguous array (but that is of lesser concern). arr.real属性还返回一个不连续的数组（但这不太重要）。

Is there anything I'm missing?有什么我想念的吗？ Are there any better ways to do this?有没有更好的方法来做到这一点？

EDIT 1 : I'm sorry for not making it clear, I don't want to overwrite arr, so I can't use it as out.编辑 1 ：很抱歉没有说清楚，我不想覆盖 arr，所以我不能使用它。

Answer 1

Thanks to numba.vectorize in recent versions of numba, creating a numpy universal function for the task is very easy: 感谢最近版本的numba中的numba.vectorize ，为任务创建一个numpy通用函数非常简单：

@numba.vectorize([numba.float64(numba.complex128),numba.float32(numba.complex64)])
def abs2(x):
    return x.real**2 + x.imag**2

On my machine, I find a threefold speedup compared to a pure-numpy version that creates intermediate arrays: 在我的机器上，与创建中间数组的纯numpy版本相比，我发现速度提高了三倍：

>>> x = np.random.randn(10000).view('c16')
>>> y = abs2(x)
>>> np.all(y == x.real**2 + x.imag**2)   # exactly equal, being the same operation
True
>>> %timeit np.abs(x)**2
10000 loops, best of 3: 81.4 µs per loop
>>> %timeit x.real**2 + x.imag**2
100000 loops, best of 3: 12.7 µs per loop
>>> %timeit abs2(x)
100000 loops, best of 3: 4.6 µs per loop

Answer 2

EDIT: this solution has twice the minimum memory requirement, and is just marginally faster. 编辑：此解决方案的最低内存要求是两倍，而且速度稍快。 The discussion in the comments is good for reference however. 然而，评论中的讨论有助于参考。

Here's a faster solution, with the result stored in res : 这是一个更快的解决方案，结果存储在res ：

import numpy as np
res = arr.conjugate()
np.multiply(arr,res,out=res)

where we exploited the property of the abs of a complex number, ie abs(z) = sqrt(z*z.conjugate) , so that abs(z)**2 = z*z.conjugate 我们利用了复数的abs的属性，即abs(z) = sqrt(z*z.conjugate) ，这样abs(z)**2 = z*z.conjugate

Answer 3

arr.real and arr.imag are only views into the complex array. arr.real和arr.imag只是复杂数组的视图。 So no additional memory is allocated. 因此没有分配额外的内存。

Answer 4

If your primary goal is to conserve memory, NumPy's ufuncs take an optional out parameter that lets you direct the output to an array of your choosing. 如果您的主要目标是节省内存，NumPy的ufuncs会选择一个可选的out参数，让您将输出定向到您选择的数组。 It can be useful when you want to perform operations in place. 当您想要执行操作时，它非常有用。

If you make this minor modification to your first method, then you can perform the operation on arr completely in place: 如果您对第一个方法进行了这个小修改，那么您可以完全对arr执行操作：

np.abs(arr, out=arr)
arr **= 2

One convoluted way that only uses a little extra memory could be to modify arr in place, compute the new array of real values and then restore arr . 一种只使用一点额外内存的复杂方式可能是将arr修改到位，计算新的实数值数组然后恢复arr 。

This means storing information about the signs (unless you know that your complex numbers all have positive real and imaginary parts). 这意味着存储有关标志的信息（除非您知道您的复数都具有正的实部和虚部）。 Only a single bit is needed for the sign of each real or imaginary value, so this uses 1/16 + 1/16 == 1/8 the memory of arr (in addition to the new array of floats you create). 每个实数或虚数的符号只需要一个比特，因此这使用了arr的内存的1/16 + 1/16 == 1/8 （除了你创建的新浮点数）。

>>> signs_real = np.signbit(arr.real) # store information about the signs
>>> signs_imag = np.signbit(arr.imag)
>>> arr.real **= 2 # square the real and imaginary values
>>> arr.imag **= 2
>>> result = arr.real + arr.imag
>>> arr.real **= 0.5 # positive square roots of real and imaginary values
>>> arr.imag **= 0.5
>>> arr.real[signs_real] *= -1 # restore the signs of the real and imagary values
>>> arr.imag[signs_imag] *= -1

At the expense of storing signbits, arr is unchanged and result holds the values we want. 以存储signbits为代价， arr不变， result保存我们想要的值。

Answer 5

If you don't want sqrt (what should be much heavier than multiply), then no abs .如果你不想要sqrt （应该比乘法重得多），那么没有abs 。

If you don't want double memory, then no real**2 + imag**2如果你不想要双内存，那就没有real**2 + imag**2

Then you might try this (use indexing trick)那么你可以试试这个（使用索引技巧）

N0 = 23
np0 = (np.random.randn(N0) + 1j*np.random.randn(N0)).astype(np.complex128)
ret_ = np.abs(np0)**2
tmp0 = np0.view(np.float64)
ret0 = np.matmul(tmp0.reshape(N0,1,2), tmp0.reshape(N0,2,1)).reshape(N0)
assert np.abs(ret_-ret0).max()<1e-7

Anyway, I prefer the numba solution无论如何，我更喜欢numba解决方案

计算复杂 numpy ndarray 的 abs()**2 的最节省内存的方法

问题描述

5 个解决方案

解决方案1
7 已采纳 2016-06-15 22:01:59

解决方案2
3 2015-05-25 13:32:36

解决方案3
1 2015-05-25 12:15:57

解决方案4
1 2015-05-25 12:36:11

解决方案5
0 2021-12-08 14:20:29

计算复杂 numpy ndarray 的 abs()**2 的最节省内存的方法

问题描述

5 个解决方案

解决方案1 7 已采纳 2016-06-15 22:01:59

解决方案2 3 2015-05-25 13:32:36

解决方案3 1 2015-05-25 12:15:57

解决方案4 1 2015-05-25 12:36:11

解决方案5 0 2021-12-08 14:20:29

解决方案1
7 已采纳 2016-06-15 22:01:59

解决方案2
3 2015-05-25 13:32:36

解决方案3
1 2015-05-25 12:15:57

解决方案4
1 2015-05-25 12:36:11

解决方案5
0 2021-12-08 14:20:29