查找 numpy 數組的 k 個最小值的索引

Question

為了找到最小值的索引，我可以使用argmin ：

import numpy as np
A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
print A.argmin()     # 4 because A[4] = 0.1

但是我怎樣才能找到k 最小值的索引呢？

我正在尋找類似的東西：

 print A.argmin(numberofvalues=3) # [4, 0, 7] because A[4] <= A[0] <= A[7] <= all other A[i]

注意：在我的用例中，A 有大約 10 000 到 100 000 個值，我只對 k=10 最小值的索引感興趣。 k 永遠不會 > 10。

Answer 1

使用np.argpartition 。 它不會對整個數組進行排序。 它只保證kth元素處於排序位置，並且所有較小的元素都將移動到它之前。 因此，前k元素將是 k 個最小的元素。

import numpy as np

A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
k = 3

idx = np.argpartition(A, k)
print(idx)
# [4 0 7 3 1 2 6 5]

這將返回 k 最小值。 請注意，這些可能不是按排序順序排列的。

print(A[idx[:k]])
# [ 0.1  1.   1.5]

要獲得 k 最大值，請使用

idx = np.argpartition(A, -k)
# [4 0 7 3 1 2 6 5]

A[idx[-k:]]
# [  9.  17.  17.]

警告：不要（重新）使用idx = np.argpartition(A, k); A[idx[-k:]] idx = np.argpartition(A, k); A[idx[-k:]]獲得 k 最大。 那不會總是奏效。 例如，這些不是x的 3 個最大值：

x = np.array([100, 90, 80, 70, 60, 50, 40, 30, 20, 10, 0])
idx = np.argpartition(x, 3)
x[idx[-3:]]
array([ 70,  80, 100])

這是與np.argsort的比較，它也有效，但只是對整個數組進行排序以獲得結果。

In [2]: x = np.random.randn(100000)

In [3]: %timeit idx0 = np.argsort(x)[:100]
100 loops, best of 3: 8.26 ms per loop

In [4]: %timeit idx1 = np.argpartition(x, 100)[:100]
1000 loops, best of 3: 721 µs per loop

In [5]: np.alltrue(np.sort(np.argsort(x)[:100]) == np.sort(np.argpartition(x, 100)[:100]))
Out[5]: True

Answer 2

您可以將numpy.argsort與切片numpy.argsort使用

>>> import numpy as np
>>> A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
>>> np.argsort(A)[:3]
array([4, 0, 7], dtype=int32)

Answer 3

對於n 維數組，此函數運行良好。 indecies 以可調用的形式返回。 如果要返回索引列表，則需要在創建列表之前轉置數組。

要檢索最大的k ，只需傳入-k 。

def get_indices_of_k_smallest(arr, k):
    idx = np.argpartition(arr.ravel(), k)
    return tuple(np.array(np.unravel_index(idx, arr.shape))[:, range(min(k, 0), max(k, 0))])
    # if you want it in a list of indices . . . 
    # return np.array(np.unravel_index(idx, arr.shape))[:, range(k)].transpose().tolist()

例子：

r = np.random.RandomState(1234)
arr = r.randint(1, 1000, 2 * 4 * 6).reshape(2, 4, 6)

indices = get_indices_of_k_smallest(arr, 4)
indices
# (array([1, 0, 0, 1], dtype=int64),
#  array([3, 2, 0, 1], dtype=int64),
#  array([3, 0, 3, 3], dtype=int64))

arr[indices]
# array([ 4, 31, 54, 77])

%%timeit
get_indices_of_k_smallest(arr, 4)
# 17.1 µs ± 651 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Answer 4

numpy.partition(your_array, k)是另一種選擇。 不需要切片，因為它給出了排序到第kth元素的值。

查找 numpy 數組的 k 個最小值的索引

問題描述

4 個解決方案

解決方案1
137 已采納 2015-12-11 15:20:07

解決方案2
20 2015-12-11 15:05:45

解決方案3
2 2018-07-25 15:58:28

解決方案4
0 2017-12-03 01:25:36

查找 numpy 數組的 k 個最小值的索引

問題描述

4 個解決方案

解決方案1 137 已采納 2015-12-11 15:20:07

解決方案2 20 2015-12-11 15:05:45

解決方案3 2 2018-07-25 15:58:28

解決方案4 0 2017-12-03 01:25:36

解決方案1
137 已采納 2015-12-11 15:20:07

解決方案2
20 2015-12-11 15:05:45

解決方案3
2 2018-07-25 15:58:28

解決方案4
0 2017-12-03 01:25:36