简体   繁体   English

延伸numpy面具

[英]Extending numpy mask

I want to mask a numpy array a with mask . 我想掩盖numpy的阵列amask The mask doesn't have exactly the same shape as a , but it is possible to mask a anyway (I guess because of the additional dimension being 1-dimensional (broadcasting?)). 面具不具有完全相同的形状为a ,但它有可能掩盖a反正(我猜是因为额外的维度是一维(广播?))。

a.shape
>>> (3, 9, 31, 2, 1)
mask.shape
>>> (3, 9, 31, 2)
masked_a = ma.masked_array(a, mask)

The same logic however, does not apply to array b which has 5 elements in its last dimension. 然而,相同的逻辑不适用于在其最后维度中具有5个元素的阵列b

ext_mask = mask[..., np.newaxis] # extending or not extending has same effect
ext_mask.shape
>>> (3, 9, 31, 2, 1)

b.shape
>>> (3, 9, 31, 2, 5)
masked_b = ma.masked_array(b, ext_mask)
>>> numpy.ma.core.MaskError: Mask and data not compatible: data size is 8370, mask size is 1674.

How can I create a (3, 9, 31, 2, 5) mask from a (3, 9, 31, 2) mask by expanding any True value in the last dimension of the (3, 9, 31, 2) mask to [True, True, True, True, True] (and False respectively)? 如何通过展开(3, 9, 31, 2, 5) 3,9,31,2 (3, 9, 31, 2)蒙版的最后一个维度中的任何True值,从(3, 9, 31, 2) True (3, 9, 31, 2)蒙版创建(3,9,31,2,5 (3, 9, 31, 2)蒙版到[True, True, True, True, True] (和分别为False )?

This gives the desired result: 这给出了期望的结果:

masked_b = ma.masked_array(*np.broadcast(b, ext_mask))

I have not profiled this method, but it should be faster than allocating a new mask. 我没有描述过这种方法,但它应该比分配一个新的掩码更快。 According to the documentation , no data is copied: 根据文档 ,没有数据被复制:

These arrays are views on the original arrays. 这些数组是原始数组的视图。 They are typically not contiguous. 它们通常不是连续的。 Furthermore, more than one element of a broadcasted array may refer to a single memory location. 此外,广播阵列的多于一个元素可以指代单个存储位置。 If you need to write to the arrays, make copies first. 如果需要写入数组,请先进行复制。

It is possible to verify the no-copying behavior: 可以验证无复制行为:

bb, mb = np.broadcast(b, ext_mask)
print(mb.shape)       # (3, 9, 31, 2, 5) - same shape as b
print(mb.base.shape)  # (3, 9, 31, 2) - the shape of the original mask
print(mb.strides)     # (558, 62, 2, 1, 0) - that's how it works: 0 stride

Pretty impressive how the numpy developers implemented broadcasting. numpy开发人员如何实现广播,令人印象深刻。 Values are repeated by using a stride of 0 along the last dimension. 通过沿最后一个维度使用步长0来重复值。 Whow! Whow!

Edit 编辑

I compared the speed of broadcasting and allocating with this code: 我将广播和分配的速度与此代码进行了比较:

import numpy as np
from numpy import ma

a = np.random.randn(30, 90, 31, 2, 1)
b = np.random.randn(30, 90, 31, 2, 5)

mask = np.random.randn(30, 90, 31, 2) > 0
ext_mask = mask[..., np.newaxis]

def broadcasting(a=a, b=b, ext_mask=ext_mask):
    mb1 = ma.masked_array(*np.broadcast_arrays(b, ext_mask))

def allocating(a=a, b=b, ext_mask=ext_mask):
    m2 = np.empty(b.shape, dtype=bool)
    m2[:] = ext_mask
    mb2 = ma.masked_array(b, m2)

Broadcasting is clearly faster than allocating, here: 广播显然比分配更快,在这里:

    # array size: (30, 90, 31, 2, 5)

In [23]: %timeit broadcasting()
The slowest run took 10.39 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 39.4 µs per loop

In [24]: %timeit allocating()
The slowest run took 4.86 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 982 µs per loop

Note that I had to increase array size for the difference in speed to become apparent. 请注意,我必须增加数组大小才能显示速度差异。 With the original array dimensions allocating was slightly faster than broadcasting: 使用原始数组维度分配比广播稍快:

    # array size: (3, 9, 31, 2, 5)

In [28]: %timeit broadcasting()
The slowest run took 9.36 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 39 µs per loop

In [29]: %timeit allocating()
The slowest run took 9.22 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 32.6 µs per loop

The broadcasting solution's runtime seems not to depend on array size. 广播解决方案的运行时似乎不依赖于数组大小。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM