简体   繁体   English

如何调试 numpy 掩码

[英]How to debug numpy masks

This question is related to this one .这个问题与这个有关。

I have a function that I'm trying to vectorize.我有一个正在尝试矢量化的 function。 This is the original function:这是原 function:

def aspect_good(angle: float, planet1_good: bool, planet2_good: bool):
    """
    Decides if the angle represents a good aspect.
    NOTE: returns None if the angle doesn't represent an aspect.
    """

    if 112 <= angle <= 128 or 52 <= angle <= 68:
        return True
    elif 174 <= angle <= 186 or 84 <= angle <= 96:
        return False
    elif 0 <= angle <= 8 and planet1_good and planet2_good:
        return True
    elif 0 <= angle <= 6:
        return False
    else:
        return None

and this is what I have so far:这就是我到目前为止所拥有的:

def aspects_good(
    angles: npt.ArrayLike,
    planets1_good: npt.ArrayLike,
    planets2_good: npt.ArrayLike,
) -> npt.NDArray:
    """
    Decides if the angles represent good aspects.

    Note: this function was contributed by Mad Physicist. Thank you.
    https://stackoverflow.com/q/73672739/11004423

    :returns: an array with values as follows:
        1 – the angle is a good aspect
        0 – the angle is a bad aspect
       -1 – the angle doesn't represent an aspect
    """
    result = np.full_like(angles, -1, dtype=np.int8)

    bad_mask = np.abs(angles % 90) <= 6
    result[bad_mask] = 0

    good_mask = (np.abs(angles - 120) <= 8) |\
                (np.abs(angles - 60) <= 8) |\
                ((np.abs(angles - 4) <= 4) & planets1_good & planets2_good)
    result[good_mask] = 1

It's not working as expected, however, I wrote a test with pytest:它没有按预期工作,但是,我用 pytest 编写了一个测试:

def test_aspects_good():
    tests = np.array((
        (120, True, False, True),
        (60, True, False, True),
        (180, True, False, False),
        (90, True, False, False),

        (129, True, False, -1),
        (111, True, False, -1),
        (69, True, False, -1),
        (51, True, False, -1),
        (187, True, False, -1),
        (173, True, False, -1),
        (97, True, False, -1),
        (83, True, False, -1),

        (0, True, True, True),
        (0, True, False, False),
        (0, False, True, False),
        (0, False, False, False),

        (7, False, False, -1),
        (7, True, True, True),
        (9, True, True, -1),
    ))

    angles = tests[:, 0]
    planets1_good = tests[:, 1]
    planets2_good = tests[:, 2]
    expected = tests[:, 3]

    result = aspects_good(angles, planets1_good, planets2_good)
    assert np.array_equal(result, expected)

and it fails, saying False, the arrays are different.它失败了,说 False,arrays 是不同的。

Here I have result and expected arrays combined side by side:在这里,我得到了resultexpected的 arrays 并排组合:

array([[ 1,  1],
│      [ 1,  1],
│      [ 0,  0],
│      [ 0,  0],
│      [-1, -1],
│      [-1, -1],
│      [-1, -1],
│      [-1, -1],
│      [-1, -1],
│      [-1, -1],
│      [-1, -1],
│      [-1, -1],
│      [ 0,  1],
│      [ 0,  0],
│      [ 0,  0],
│      [ 0,  0],
│      [-1, -1],
│      [-1,  1],
│      [-1, -1]])

Note: the first column is result array, and the second one is expected .注意:第一列是result数组,第二列是expected的。 As you can see, they differ in two places.如您所见,它们在两个地方有所不同。 Now the question comes "How to debug this?"现在问题来了“如何调试这个?” Normally I would use a debugger, and step through each if/elif/else condition.通常我会使用调试器,并逐步检查每个 if/elif/else 条件。 But I have no idea how to debug numpy masks.但我不知道如何调试 numpy 掩码。

The issue appears to be a combination of three things:这个问题似乎是三件事的结合:

  1. Numpy uses a homogeneous type throughout an array. Numpy 在整个数组中使用同构类型。

    You will find that tests.dtype is dtype('int64') or dtype('int32') depending on your architecture.您会发现tests.dtype是 dtype dtype('int64')dtype('int32')取决于您的架构。 This means that the columns containing planet1_good and planet2_good are integers too, not booleans.这意味着包含planet1_goodplanet2_good的列也是整数,而不是布尔值。

  2. Bitwise AND ( & ) is not a logical operator.按位与 ( & )不是逻辑运算符。

    A bitwise AND operation will return a result with the largest of the input types.按位与运算将返回具有最大输入类型的结果。 Specifically for the result of <= , which is a boolean, and an int array, the result will be an int .特别是对于<=的结果,boolean 和一个int数组,结果将是一个int That means that you can do something like np.array([1, 2]) & np.array([True, True]) to get array([1, 0]) , not array([True, False]) .这意味着您可以执行类似np.array([1, 2]) & np.array([True, True])的操作来获取array([1, 0])而不是array([True, False])

  3. Numpy distinguishes between a boolean mask and a fancy index by the dtype, even if the fancy index contains only zeros and ones. Numpy 按 dtype 区分 boolean 掩码和花式索引,即使花式索引仅包含零和一。 If you have a 2 element array, x , then x[[True, True]] = 1 assigns 1 to both elements of x .如果您有一个 2 元素数组x ,则x[[True, True]] = 11分配给x的两个元素。 However, x[[1, 1]] = 1 assigns 1 only to the second element of x .但是, x[[1, 1]] = 1 1将 1 分配给x的第二个元素。

So that's basically what's happening here.所以这基本上就是这里发生的事情。 bad_mask is a boolean mask, and works exactly as you would expect. bad_mask是一个 boolean 掩码,完全按照您的预期工作。 However, good_mask ANDs with a couple of integer arrays, so becomes an integer array containing zeros and ones.但是, good_mask与一对 integer arrays 进行 AND 运算,因此变为包含零和一的 integer 数组。 The expression result[good_mask] = 1 is actually assigning the first and second element of result to be 1 , which fortuitously correspond to two of your tests.表达式result[good_mask] = 1实际上将result的第一个和第二个元素分配为1 ,这恰好对应于您的两个测试。 The remaining True results can not and will not be assigned 1 .剩下的True结果不能也不会被赋值为1

There are a few ways to fix this, listed in decreasing order of preference (my favorite on top):有几种方法可以解决这个问题,按偏好降序排列(我最喜欢的在上面):

  1. Convert all your arrays to numpy arrays of the correct type.将所有 arrays 转换为正确类型的 numpy arrays。 Right now your function does not meet the contract that it accepts any array-like.现在你的 function 不符合它接受任何类似数组的合同。 If you pass in a list for angles , you will get TypeError: unsupported operand type(s) for %: 'list' and 'int' .如果你传入一个angles列表,你会得到TypeError: unsupported operand type(s) for %: 'list' and 'int' This is a fairly idiomatic approach:这是一种相当惯用的方法:

     angles = np.asanyarray(angles) planets1_good = np.asanyarray(planets1_good, dtype=bool) planets2_good = np.asanyarray(planets2_good, dtype=bool) result = np.full_like(angles, -1, dtype=np.int8) bad_mask = np.abs(angles % 90) <= 6 result[bad_mask] = 0 good_mask = (np.abs(angles - 120) <= 8) |\ (np.abs(angles - 60) <= 8) |\ ((np.abs(angles - 4) <= 4) & planets1_good & planets2_good) result[good_mask] = 1 return result
  2. Ensure that good_mask is actually a mask before applying it.在应用之前确保good_mask实际上是一个掩码。 You should still convert angles , but the other arrays will be converted automatically by the & operator:您仍应转换angles ,但其他 arrays 将由&运算符自动转换:

     good_mask = ((np.abs(angles - 120) <= 8) |\ (np.abs(angles - 60) <= 8) |\ ((np.abs(angles - 4) <= 4) & planets1_good & planets2_good)).astype(bool)

    You may alternatively do something similar to what you did with bad_mask :你也可以做一些类似于你对bad_mask所做的事情:

     good_mask = (np.abs(angles % 60) <= 8) & (angles >= -8) & (angles <= 128)
  3. Convert the mask to an index, which won't care about the original dtype:将掩码转换为索引,它不会关心原始数据类型:

     result[np.flatnonzero(good_mask)] = 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM