[英]Extending numpy mask
I want to mask a numpy array a
with mask
. 我想掩盖numpy的阵列a
与mask
。 The mask doesn't have exactly the same shape as a
, but it is possible to mask a
anyway (I guess because of the additional dimension being 1-dimensional (broadcasting?)). 面具不具有完全相同的形状为a
,但它有可能掩盖a
反正(我猜是因为额外的维度是一维(广播?))。
a.shape
>>> (3, 9, 31, 2, 1)
mask.shape
>>> (3, 9, 31, 2)
masked_a = ma.masked_array(a, mask)
The same logic however, does not apply to array b
which has 5 elements in its last dimension. 然而,相同的逻辑不适用于在其最后维度中具有5个元素的阵列b
。
ext_mask = mask[..., np.newaxis] # extending or not extending has same effect
ext_mask.shape
>>> (3, 9, 31, 2, 1)
b.shape
>>> (3, 9, 31, 2, 5)
masked_b = ma.masked_array(b, ext_mask)
>>> numpy.ma.core.MaskError: Mask and data not compatible: data size is 8370, mask size is 1674.
How can I create a (3, 9, 31, 2, 5)
mask from a (3, 9, 31, 2)
mask by expanding any True
value in the last dimension of the (3, 9, 31, 2)
mask to [True, True, True, True, True]
(and False
respectively)? 如何通过展开(3, 9, 31, 2, 5)
3,9,31,2 (3, 9, 31, 2)
蒙版的最后一个维度中的任何True
值,从(3, 9, 31, 2)
True
(3, 9, 31, 2)
蒙版创建(3,9,31,2,5 (3, 9, 31, 2)
蒙版到[True, True, True, True, True]
(和分别为False
)?
This gives the desired result: 这给出了期望的结果:
masked_b = ma.masked_array(*np.broadcast(b, ext_mask))
I have not profiled this method, but it should be faster than allocating a new mask. 我没有描述过这种方法,但它应该比分配一个新的掩码更快。 According to the documentation , no data is copied: 根据文档 ,没有数据被复制:
These arrays are views on the original arrays. 这些数组是原始数组的视图。 They are typically not contiguous. 它们通常不是连续的。 Furthermore, more than one element of a broadcasted array may refer to a single memory location. 此外,广播阵列的多于一个元素可以指代单个存储位置。 If you need to write to the arrays, make copies first. 如果需要写入数组,请先进行复制。
It is possible to verify the no-copying behavior: 可以验证无复制行为:
bb, mb = np.broadcast(b, ext_mask)
print(mb.shape) # (3, 9, 31, 2, 5) - same shape as b
print(mb.base.shape) # (3, 9, 31, 2) - the shape of the original mask
print(mb.strides) # (558, 62, 2, 1, 0) - that's how it works: 0 stride
Pretty impressive how the numpy developers implemented broadcasting. numpy开发人员如何实现广播,令人印象深刻。 Values are repeated by using a stride of 0 along the last dimension. 通过沿最后一个维度使用步长0来重复值。 Whow! Whow!
Edit 编辑
I compared the speed of broadcasting and allocating with this code: 我将广播和分配的速度与此代码进行了比较:
import numpy as np
from numpy import ma
a = np.random.randn(30, 90, 31, 2, 1)
b = np.random.randn(30, 90, 31, 2, 5)
mask = np.random.randn(30, 90, 31, 2) > 0
ext_mask = mask[..., np.newaxis]
def broadcasting(a=a, b=b, ext_mask=ext_mask):
mb1 = ma.masked_array(*np.broadcast_arrays(b, ext_mask))
def allocating(a=a, b=b, ext_mask=ext_mask):
m2 = np.empty(b.shape, dtype=bool)
m2[:] = ext_mask
mb2 = ma.masked_array(b, m2)
Broadcasting is clearly faster than allocating, here: 广播显然比分配更快,在这里:
# array size: (30, 90, 31, 2, 5)
In [23]: %timeit broadcasting()
The slowest run took 10.39 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 39.4 µs per loop
In [24]: %timeit allocating()
The slowest run took 4.86 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 982 µs per loop
Note that I had to increase array size for the difference in speed to become apparent. 请注意,我必须增加数组大小才能显示速度差异。 With the original array dimensions allocating was slightly faster than broadcasting: 使用原始数组维度分配比广播稍快:
# array size: (3, 9, 31, 2, 5)
In [28]: %timeit broadcasting()
The slowest run took 9.36 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 39 µs per loop
In [29]: %timeit allocating()
The slowest run took 9.22 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 32.6 µs per loop
The broadcasting solution's runtime seems not to depend on array size. 广播解决方案的运行时似乎不依赖于数组大小。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.