简体   繁体   English

测试 numpy 数组中的每个元素是否位于两个值之间的简单方法?

[英]Easy way to test if each element in an numpy array lies between two values?

I was wondering if there was a syntactically simple way of checking if each element in a numpy array lies between two numbers.我想知道是否有一种语法上简单的方法来检查 numpy 数组中的每个元素是否位于两个数字之间。

In other words, just as numpy.array([1,2,3,4,5]) < 5 will return array([True, True, True, True, False]) , I was wondering if it was possible to do something akin to this:换句话说,就像numpy.array([1,2,3,4,5]) < 5将返回array([True, True, True, True, False]) ,我想知道是否可以这样做类似于这样的东西:

1 < numpy.array([1,2,3,4,5]) < 5

... to obtain... ……获得……

array([False, True, True, True, False])

I understand that I can obtain this through logical chaining of boolean tests, but I'm working through some rather complex code and I was looking for a syntactically clean solution.我知道我可以通过 boolean 测试的逻辑链接来获得这一点,但我正在处理一些相当复杂的代码,并且我正在寻找一个语法上干净的解决方案。

Any tips?有小费吗?

One solution would be:一种解决方案是:

import numpy as np
a = np.array([1, 2, 3, 4, 5])
(a > 1).all() and (a < 5).all()
# False

If you want the array of truth values, use:如果您想要真值数组,请使用:

(a > 1) & (a < 5)
# array([False,  True,  True,  True, False])

Another would be to use numpy.any , Here is an example另一种是使用numpy.any ,这是一个例子

import numpy as np
a = np.array([1,2,3,4,5])
np.any((a < 1)|(a > 5 ))

You can also center the matrix and use the distance to 0您还可以将矩阵居中并使用到 0 的距离

upper_limit = 5
lower_limit = 1
a = np.array([1,2,3,4,5])
your_mask = np.abs(a- 0.5*(upper_limit+lower_limit))<0.5*(upper_limit-lower_limit)

One thing to keep in mind is that the comparison will be symmetric on both sides, so it can do 1<x<5 or 1<=x<=5 , but not 1<=x<5要记住的一件事是,比较将在两侧对称,因此它可以执行1<x<51<=x<=5 ,但不能执行1<=x<5

In multi-dimensional arrays you could use the np.any() option suggested or comparison operators, while using & and and will raise an error.在多维数组中,您可以使用建议的np.any()选项或比较运算符,而使用&and会引发错误。

Example (on multi-dim array) using comparison operators使用比较运算符的示例(在多维数组上)

import numpy as np

arr = np.array([[1,5,1],
                [0,1,0],
                [0,0,0],
                [2,2,2]])

Now use == if you want to check if the array values are inside a range, ie A < arr < B, or != if you want to check if the array values are outside a range, ie arr < A and arr > B :现在使用==如果要检查数组值是否在范围内,即 A < arr < B,或!=如果要检查数组值是否在范围外,即 arr < A 和 arr > B :

(arr<1) != (arr>3)
> array([[False,  True, False],
         [ True, False,  True],
         [ True,  True,  True],
         [False, False, False]])

(arr>1) == (arr<4)
> array([[False, False, False],
         [False, False, False],
         [False, False, False],
         [ True,  True,  True]])

It is interesting to compare the NumPy-based approach against a Numba-accelerated loop:将基于 NumPy 的方法与 Numba 加速循环进行比较很有趣:

import numpy as np
import numba as nb


def between(arr, a, b):
    return (arr > a) & (arr < b)


@nb.njit(fastmath=True)
def between_nb(arr, a, b):
    shape = arr.shape
    arr = arr.ravel()
    n = arr.size
    result = np.empty_like(arr, dtype=np.bool_)
    for i in range(n):
        result[i] = arr[i] > a or arr[i] < b
    return result.reshape(shape)

The benchmarks computed and plotted with:基准计算和绘制:

import pandas as pd
import matplotlib.pyplot as plt


def benchmark(
    funcs,
    args=None,
    kws=None,
    ii=range(4, 24),
    m=2 ** 15,
    is_equal=np.allclose,
    seed=0,
    unit="ms",
    verbose=True
):
    labels = [func.__name__ for func in funcs]
    units = {"s": 0, "ms": 3, "µs": 6, "ns": 9}
    args = tuple(args) if args else ()
    kws = dict(kws) if kws else {}
    assert unit in units
    np.random.seed(seed)
    timings = {}
    for i in ii:
        n = 2 ** i
        k = 1 + m // n
        if verbose:
            print(f"i={i}, n={n}, m={m}, k={k}")
        arrs = np.random.random((k, n))
        base = np.array([funcs[0](arr, *args, **kws) for arr in arrs])
        timings[n] = []
        for func in funcs:
            res = np.array([func(arr, *args, **kws) for arr in arrs])
            is_good = is_equal(base, res)
            timed = %timeit -n 8 -r 8 -q -o [func(arr, *args, **kws) for arr in arrs]
            timing = timed.best / k
            timings[n].append(timing if is_good else None)
            if verbose:
                print(
                    f"{func.__name__:>24}"
                    f"  {is_good!s:5}"
                    f"  {timing * (10 ** units[unit]):10.3f} {unit}"
                    f"  {timings[n][0] / timing:5.1f}x")
    return timings, labels
funcs = between, between_nb
timings, labels = benchmark(funcs, args=(0.25, 0.75), unit="µs", verbose=False)
plot(timings, labels, unit="µs")

resulting in:导致: 在此处输入图像描述

indicate that (under my testing conditions):表明(在我的测试条件下):

  • for larger and smaller inputs, the Numba approach can be up to 20% faster对于更大和更小的输入,Numba 方法可以快 20%
  • for inputs of medium size, the NumPy approach is typically faster对于中等大小的输入,NumPy 方法通常更快

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM