如何在numpy ndarray中获得每行的N个最大值？

Question

We know how to do it when N = 1 当N = 1时，我们知道如何做到这一点

import numpy as np

m = np.arange(15).reshape(3, 5)
m[xrange(len(m)), m.argmax(axis=1)]    # array([ 4,  9, 14])

What is the best way to get the top N, when N > 1? 当N> 1时，获得前N的最佳方法是什么？ (say, 5) （比方说，5）

Answer 1

Doing a partial sort using np.partition can be much cheaper than a full sort: 使用np.partition进行部分排序可能比完整排序便宜得多：

gen = np.random.RandomState(0)
x = gen.permutation(100)

# full sort
print(np.sort(x)[-10:])
# [90 91 92 93 94 95 96 97 98 99]

# partial sort such that the largest 10 items are in the last 10 indices
print(np.partition(x, -10)[-10:])
# [90 91 93 92 94 96 98 95 97 99]

If you need the largest N items to be sorted, you can call np.sort on the last N elements in your partially sorted array: 如果需要排序最大的N个项目，可以在部分排序的数组中的最后N个元素上调用np.sort ：

print(np.sort(np.partition(x, -10)[-10:]))
# [90 91 92 93 94 95 96 97 98 99]

This can still be much faster than a full sort on the whole array, provided your array is sufficiently large. 如果您的阵列足够大，这仍然比整个阵列上的完整排序快得多。

To sort across each row of a two-dimensional array you can use the axis= arguments to np.partition and/or np.sort : 要对二维数组的每一行进行排序，可以使用axis= arguments to np.partition和/或np.sort ：

y = np.repeat(np.arange(100)[None, :], 5, 0)
gen.shuffle(y.T)

# partial sort, followed by a full sort of the last 10 elements in each row
print(np.sort(np.partition(y, -10, axis=1)[:, -10:], axis=1))
# [[90 91 92 93 94 95 96 97 98 99]
#  [90 91 92 93 94 95 96 97 98 99]
#  [90 91 92 93 94 95 96 97 98 99]
#  [90 91 92 93 94 95 96 97 98 99]
#  [90 91 92 93 94 95 96 97 98 99]]

Benchmarks: 基准：

In [1]: %%timeit x = np.random.permutation(10000000)
   ...: np.sort(x)[-10:]
   ...: 
1 loop, best of 3: 958 ms per loop

In [2]: %%timeit x = np.random.permutation(10000000)
np.partition(x, -10)[-10:]
   ....: 
10 loops, best of 3: 41.3 ms per loop

In [3]: %%timeit x = np.random.permutation(10000000)
np.sort(np.partition(x, -10)[-10:])
   ....: 
10 loops, best of 3: 78.8 ms per loop

Answer 2

为什么不这样做：

np.sort(m)[:,-N:]

Answer 3

partition , sort , argsort etc take an axis parameter partition ， sort ， argsort等采用轴参数

Let's shuffle some values 让我们改变一些价值观

In [161]: A=np.arange(24)

In [162]: np.random.shuffle(A)

In [163]: A=A.reshape(4,6)

In [164]: A
Out[164]: 
array([[ 1,  2,  4, 19, 12, 11],
       [20,  5, 13, 21, 22,  3],
       [10,  6, 16, 18, 17,  8],
       [23,  9,  7,  0, 14, 15]])

Partition: 划分：

In [165]: A.partition(4,axis=1)

In [166]: A
Out[166]: 
array([[ 2,  1,  4, 11, 12, 19],
       [ 5,  3, 13, 20, 21, 22],
       [ 6,  8, 10, 16, 17, 18],
       [14,  7,  9,  0, 15, 23]])

the 4 smallest values of each row are first, the 2 largest last; 每行的4个最小值是第一个，最后2个; slice to get an array of the 2 largest: 切片获取2个最大的数组：

In [167]: A[:,-2:]
Out[167]: 
array([[12, 19],
       [21, 22],
       [17, 18],
       [15, 23]])

Sort is probably slower, but on a small array like this probably doesn't matter much. 排序可能更慢，但在这样的小阵列上可能并不重要。 Plus it lets you pick any N. 另外它可以让你挑选任何N.

In [169]: A.sort(axis=1)

In [170]: A
Out[170]: 
array([[ 1,  2,  4, 11, 12, 19],
       [ 3,  5, 13, 20, 21, 22],
       [ 6,  8, 10, 16, 17, 18],
       [ 0,  7,  9, 14, 15, 23]])

如何在numpy ndarray中获得每行的N个最大值？

问题描述

3 个解决方案

解决方案1
3 2016-05-09 22:55:28

解决方案2
2 2016-05-09 21:15:57

解决方案3
2 2016-05-09 23:15:57

如何在numpy ndarray中获得每行的N个最大值？

问题描述

3 个解决方案

解决方案1 3 2016-05-09 22:55:28

解决方案2 2 2016-05-09 21:15:57

解决方案3 2 2016-05-09 23:15:57

解决方案1
3 2016-05-09 22:55:28

解决方案2
2 2016-05-09 21:15:57

解决方案3
2 2016-05-09 23:15:57