简体   繁体   English

Numpy 第一次出现大于现有值的值

[英]Numpy first occurrence of value greater than existing value

I have a 1D array in numpy and I want to find the position of the index where a value exceeds the value in numpy array.我在 numpy 中有一个一维数组,我想找到一个值超过 numpy 数组中的值的索引位置。

Eg例如

aa = range(-10,10)

Find position in aa where, the value 5 gets exceeded.aa中找到超过值5位置。

This is a little faster (and looks nicer)这有点快(而且看起来更好)

np.argmax(aa>5)

Since argmax will stop at the first True ("In case of multiple occurrences of the maximum values, the indices corresponding to the first occurrence are returned.") and doesn't save another list.由于argmax将在第一个True处停止(“如果多次出现最大值,则返回与第一次出现对应的索引。”)并且不保存另一个列表。

In [2]: N = 10000

In [3]: aa = np.arange(-N,N)

In [4]: timeit np.argmax(aa>N/2)
100000 loops, best of 3: 52.3 us per loop

In [5]: timeit np.where(aa>N/2)[0][0]
10000 loops, best of 3: 141 us per loop

In [6]: timeit np.nonzero(aa>N/2)[0][0]
10000 loops, best of 3: 142 us per loop

given the sorted content of your array, there is an even faster method: searchsorted .给定数组的排序内容,还有一种更快的方法: searchsorted

import time
N = 10000
aa = np.arange(-N,N)
%timeit np.searchsorted(aa, N/2)+1
%timeit np.argmax(aa>N/2)
%timeit np.where(aa>N/2)[0][0]
%timeit np.nonzero(aa>N/2)[0][0]

# Output
100000 loops, best of 3: 5.97 µs per loop
10000 loops, best of 3: 46.3 µs per loop
10000 loops, best of 3: 154 µs per loop
10000 loops, best of 3: 154 µs per loop

I was also interested in this and I've compared all the suggested answers with perfplot .我也对此很感兴趣,并且将所有建议的答案与perfplot 进行了比较 (Disclaimer: I'm the author of perfplot.) (免责声明:我是 perfplot 的作者。)

If you know that the array you're looking through is already sorted , then如果您知道您正在查看的数组已经排序,那么

numpy.searchsorted(a, alpha)

is for you.是给你的。 It's O(log(n)) operation, ie, the speed hardly depends on the size of the array.它是 O(log(n)) 操作,即速度几乎不取决于数组的大小。 You can't get faster than that.你不能比这更快。

If you don't know anything about your array, you're not going wrong with如果您对阵列一无所知,那么您就不会出错

numpy.argmax(a > alpha)

Already sorted:已经排序:

在此处输入图片说明

Unsorted:未分类:

在此处输入图片说明

Code to reproduce the plot:重现情节的代码:

import numpy
import perfplot


alpha = 0.5
numpy.random.seed(0)


def argmax(data):
    return numpy.argmax(data > alpha)


def where(data):
    return numpy.where(data > alpha)[0][0]


def nonzero(data):
    return numpy.nonzero(data > alpha)[0][0]


def searchsorted(data):
    return numpy.searchsorted(data, alpha)


perfplot.save(
    "out.png",
    # setup=numpy.random.rand,
    setup=lambda n: numpy.sort(numpy.random.rand(n)),
    kernels=[argmax, where, nonzero, searchsorted],
    n_range=[2 ** k for k in range(2, 23)],
    xlabel="len(array)",
)
In [34]: a=np.arange(-10,10)

In [35]: a
Out[35]:
array([-10,  -9,  -8,  -7,  -6,  -5,  -4,  -3,  -2,  -1,   0,   1,   2,
         3,   4,   5,   6,   7,   8,   9])

In [36]: np.where(a>5)
Out[36]: (array([16, 17, 18, 19]),)

In [37]: np.where(a>5)[0][0]
Out[37]: 16

Arrays that have a constant step between elements元素之间具有恒定步长的数组

In case of a range or any other linearly increasing array you can simply calculate the index programmatically, no need to actually iterate over the array at all:如果是range或任何其他线性增加的数组,您可以简单地以编程方式计算索引,根本不需要实际遍历数组:

def first_index_calculate_range_like(val, arr):
    if len(arr) == 0:
        raise ValueError('no value greater than {}'.format(val))
    elif len(arr) == 1:
        if arr[0] > val:
            return 0
        else:
            raise ValueError('no value greater than {}'.format(val))

    first_value = arr[0]
    step = arr[1] - first_value
    # For linearly decreasing arrays or constant arrays we only need to check
    # the first element, because if that does not satisfy the condition
    # no other element will.
    if step <= 0:
        if first_value > val:
            return 0
        else:
            raise ValueError('no value greater than {}'.format(val))

    calculated_position = (val - first_value) / step

    if calculated_position < 0:
        return 0
    elif calculated_position > len(arr) - 1:
        raise ValueError('no value greater than {}'.format(val))

    return int(calculated_position) + 1

One could probably improve that a bit.人们可能会稍微改进一下。 I have made sure it works correctly for a few sample arrays and values but that doesn't mean there couldn't be mistakes in there, especially considering that it uses floats...我已经确保它对一些样本数组和值正常工作,但这并不意味着那里不会有错误,特别是考虑到它使用浮点数......

>>> import numpy as np
>>> first_index_calculate_range_like(5, np.arange(-10, 10))
16
>>> np.arange(-10, 10)[16]  # double check
6

>>> first_index_calculate_range_like(4.8, np.arange(-10, 10))
15

Given that it can calculate the position without any iteration it will be constant time ( O(1) ) and can probably beat all other mentioned approaches.鉴于它可以在没有任何迭代的情况下计算位置,它将是恒定时间( O(1) )并且可能会击败所有其他提到的方法。 However it requires a constant step in the array, otherwise it will produce wrong results.但是它需要在数组中保持一个恒定的步长,否则会产生错误的结果。

General solution using numba使用 numba 的通用解决方案

A more general approach would be using a numba function:更通用的方法是使用 numba 函数:

@nb.njit
def first_index_numba(val, arr):
    for idx in range(len(arr)):
        if arr[idx] > val:
            return idx
    return -1

That will work for any array but it has to iterate over the array, so in the average case it will be O(n) :这适用于任何数组,但它必须遍历数组,所以在平均情况下它将是O(n)

>>> first_index_numba(4.8, np.arange(-10, 10))
15
>>> first_index_numba(5, np.arange(-10, 10))
16

Benchmark基准

Even though Nico Schlömer already provided some benchmarks I thought it might be useful to include my new solutions and to test for different "values".尽管 Nico Schlömer 已经提供了一些基准,但我认为包含我的新解决方案并测试不同的“值”可能会很有用。

The test setup:测试设置:

import numpy as np
import math
import numba as nb

def first_index_using_argmax(val, arr):
    return np.argmax(arr > val)

def first_index_using_where(val, arr):
    return np.where(arr > val)[0][0]

def first_index_using_nonzero(val, arr):
    return np.nonzero(arr > val)[0][0]

def first_index_using_searchsorted(val, arr):
    return np.searchsorted(arr, val) + 1

def first_index_using_min(val, arr):
    return np.min(np.where(arr > val))

def first_index_calculate_range_like(val, arr):
    if len(arr) == 0:
        raise ValueError('empty array')
    elif len(arr) == 1:
        if arr[0] > val:
            return 0
        else:
            raise ValueError('no value greater than {}'.format(val))

    first_value = arr[0]
    step = arr[1] - first_value
    if step <= 0:
        if first_value > val:
            return 0
        else:
            raise ValueError('no value greater than {}'.format(val))

    calculated_position = (val - first_value) / step

    if calculated_position < 0:
        return 0
    elif calculated_position > len(arr) - 1:
        raise ValueError('no value greater than {}'.format(val))

    return int(calculated_position) + 1

@nb.njit
def first_index_numba(val, arr):
    for idx in range(len(arr)):
        if arr[idx] > val:
            return idx
    return -1

funcs = [
    first_index_using_argmax, 
    first_index_using_min, 
    first_index_using_nonzero,
    first_index_calculate_range_like, 
    first_index_numba, 
    first_index_using_searchsorted, 
    first_index_using_where
]

from simple_benchmark import benchmark, MultiArgument

and the plots were generated using:并且使用以下方法生成图:

%matplotlib notebook
b.plot()

item is at the beginning项目在开头

b = benchmark(
    funcs,
    {2**i: MultiArgument([0, np.arange(2**i)]) for i in range(2, 20)},
    argument_name="array size")

在此处输入图片说明

The numba function performs best followed by the calculate-function and the searchsorted function. numba 函数表现最好,其次是计算函数和搜索排序函数。 The other solutions perform much worse.其他解决方案的表现要差得多。

item is at the end项目在最后

b = benchmark(
    funcs,
    {2**i: MultiArgument([2**i-2, np.arange(2**i)]) for i in range(2, 20)},
    argument_name="array size")

在此处输入图片说明

For small arrays the numba function performs amazingly fast, however for bigger arrays it's outperformed by the calculate-function and the searchsorted function.对于小数组,numba 函数的执行速度非常快,但是对于较大的数组,它的计算函数和 searchsorted 函数的性能要好得多。

item is at sqrt(len)项目在 sqrt(len)

b = benchmark(
    funcs,
    {2**i: MultiArgument([np.sqrt(2**i), np.arange(2**i)]) for i in range(2, 20)},
    argument_name="array size")

在此处输入图片说明

This is more interesting.这更有趣。 Again numba and the calculate function perform great, however this is actually triggering the worst case of searchsorted which really doesn't work well in this case.再次 numba 和计算函数表现很好,但是这实际上触发了 searchsorted 的最坏情况,在这种情况下它确实不能很好地工作。

Comparison of the functions when no value satisfies the condition没有值满足条件时的函数比较

Another interesting point is how these function behave if there is no value whose index should be returned:另一个有趣的一点是,如果没有应返回其索引的值,这些函数的行为如何:

arr = np.ones(100)
value = 2

for func in funcs:
    print(func.__name__)
    try:
        print('-->', func(value, arr))
    except Exception as e:
        print('-->', e)

With this result:有了这个结果:

first_index_using_argmax
--> 0
first_index_using_min
--> zero-size array to reduction operation minimum which has no identity
first_index_using_nonzero
--> index 0 is out of bounds for axis 0 with size 0
first_index_calculate_range_like
--> no value greater than 2
first_index_numba
--> -1
first_index_using_searchsorted
--> 101
first_index_using_where
--> index 0 is out of bounds for axis 0 with size 0

Searchsorted, argmax, and numba simply return a wrong value. Searchsorted、argmax 和 numba 只会返回错误的值。 However searchsorted and numba return an index that is not a valid index for the array.但是searchsortednumba返回的索引不是数组的有效索引。

The functions where , min , nonzero and calculate throw an exception. where , min , nonzerocalculate函数抛出异常。 However only the exception for calculate actually says anything helpful.然而,只有calculate的例外实际上说明了任何有用的东西。

That means one actually has to wrap these calls in an appropriate wrapper function that catches exceptions or invalid return values and handle appropriately, at least if you aren't sure if the value could be in the array.这意味着人们实际上必须将这些调用包装在一个适当的包装函数中,该函数捕获异常或无效的返回值并进行适当的处​​理,至少在您不确定该值是否在数组中的情况下。


Note: The calculate and searchsorted options only work in special conditions.注意:calculate 和searchsorted选项仅适用于特殊条件。 The "calculate" function requires a constant step and the searchsorted requires the array to be sorted. "calculate" 函数需要一个恒定的步骤,而 searchsorted 需要对数组进行排序。 So these could be useful in the right circumstances but aren't general solutions for this problem.所以这些在正确的情况下可能很有用,但不是这个问题的通用解决方案。 In case you're dealing with sorted Python lists you might want to take a look at the bisect module instead of using Numpys searchsorted.如果您正在处理已排序的Python 列表,您可能需要查看bisect模块而不是使用 Numpys searchsorted。

I'd like to propose我想提议

np.min(np.append(np.where(aa>5)[0],np.inf))

This will return the smallest index where the condition is met, while returning infinity if the condition is never met (and where returns an empty array).这将返回满足条件的最小索引,如果从未满足条件则返回无穷大(并且where返回一个空数组)。

I would go with我会和

i = np.min(np.where(V >= x))

where V is vector (1d array), x is the value and i is the resulting index.其中V是向量(一V数组), x是值, i是结果索引。

You should use np.where instead of np.argmax .您应该使用np.where而不是np.argmax The latter will return position 0 even if no value is found, which is not the indexes you expect.即使没有找到值,后者也会返回位置 0,这不是您期望的索引。

>>> aa = np.array(range(-10,10))
>>> print(aa)
array([-10,  -9,  -8,  -7,  -6,  -5,  -4,  -3,  -2,  -1,   0,   1,   2,
         3,   4,   5,   6,   7,   8,   9])

If the condition is met, it returns an array of the indexes.如果满足条件,则返回索引数组。

>>> idx = np.where(aa > 5)[0]
>>> print(idx)
array([16, 17, 18, 19], dtype=int64)

Otherwise, if not met, it returns an empty array.否则,如果不满足,则返回一个空数组。

>>> not_found = len(np.where(aa > 20)[0])
>>> print(not_found)
array([], dtype=int64)

The point against argmax for this case is: the simpler the best, IF the solution is not ambiguous .对于这种情况,反对argmax的要点是:如果解决方案没有歧义,越简单argmax So, to check if something fell into the condition, just do a if len(np.where(aa > value_to_search)[0]) > 0 .因此,要检查是否有某些内容符合条件,只需执行if len(np.where(aa > value_to_search)[0]) > 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 numpy argmax 在以下数组中查找大于 5 的值的第一次出现: arr = range(2, 20) - Use numpy argmax to find the first occurrence of value greater than 5 in the following array: arr = range(2, 20) Python pandas dataframe - 找到大于特定值的第一个匹配项 - Python pandas dataframe - find the first occurrence that is greater than a specific value 首次出现的值大于numpy数组中给定的值 - first occurence of a value greater than given in numpy array select 第一次出现,其中每个 A(key) 的列值大于 x | dataframe - select first occurrence where column value is greater than x for each A(key) | dataframe 在 numpy 数组中查找第一次出现不是 X 或 Y 的某个值的索引 - Find the index of the first occurrence of some value that is not X or Y in a numpy array Pandas 如何检查 Numpy 浮点值是否大于 0 - Pandas How to Check If a Numpy Float Value is Greater than 0 如何截断大于指定值的numpy数组? - How to truncate a numpy array for values greater than a specified value? "替换大于某个值的 Python NumPy 数组的所有元素" - Replace all elements of Python NumPy Array that are greater than some value Numpy:获取索引大于值且条件为真的数组 - Numpy: get array where index greater than value and condition is true 在数据框的列中查找大于另一个的第一个值 - Find first value in dataframe's columns greater than another
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM