[英]How to get the N maximum values per row in a numpy ndarray?
We know how to do it when N = 1 当N = 1时,我们知道如何做到这一点
import numpy as np
m = np.arange(15).reshape(3, 5)
m[xrange(len(m)), m.argmax(axis=1)] # array([ 4, 9, 14])
What is the best way to get the top N, when N > 1? 当N> 1时,获得前N的最佳方法是什么? (say, 5)
(比方说,5)
Doing a partial sort using np.partition
can be much cheaper than a full sort: 使用
np.partition
进行部分排序可能比完整排序便宜得多:
gen = np.random.RandomState(0)
x = gen.permutation(100)
# full sort
print(np.sort(x)[-10:])
# [90 91 92 93 94 95 96 97 98 99]
# partial sort such that the largest 10 items are in the last 10 indices
print(np.partition(x, -10)[-10:])
# [90 91 93 92 94 96 98 95 97 99]
If you need the largest N items to be sorted, you can call np.sort
on the last N elements in your partially sorted array: 如果需要排序最大的N个项目,可以在部分排序的数组中的最后N个元素上调用
np.sort
:
print(np.sort(np.partition(x, -10)[-10:]))
# [90 91 92 93 94 95 96 97 98 99]
This can still be much faster than a full sort on the whole array, provided your array is sufficiently large. 如果您的阵列足够大,这仍然比整个阵列上的完整排序快得多。
To sort across each row of a two-dimensional array you can use the axis=
arguments to np.partition
and/or np.sort
: 要对二维数组的每一行进行排序,可以使用
axis=
arguments to np.partition
和/或np.sort
:
y = np.repeat(np.arange(100)[None, :], 5, 0)
gen.shuffle(y.T)
# partial sort, followed by a full sort of the last 10 elements in each row
print(np.sort(np.partition(y, -10, axis=1)[:, -10:], axis=1))
# [[90 91 92 93 94 95 96 97 98 99]
# [90 91 92 93 94 95 96 97 98 99]
# [90 91 92 93 94 95 96 97 98 99]
# [90 91 92 93 94 95 96 97 98 99]
# [90 91 92 93 94 95 96 97 98 99]]
Benchmarks: 基准:
In [1]: %%timeit x = np.random.permutation(10000000)
...: np.sort(x)[-10:]
...:
1 loop, best of 3: 958 ms per loop
In [2]: %%timeit x = np.random.permutation(10000000)
np.partition(x, -10)[-10:]
....:
10 loops, best of 3: 41.3 ms per loop
In [3]: %%timeit x = np.random.permutation(10000000)
np.sort(np.partition(x, -10)[-10:])
....:
10 loops, best of 3: 78.8 ms per loop
为什么不这样做:
np.sort(m)[:,-N:]
partition
, sort
, argsort
etc take an axis parameter partition
, sort
, argsort
等采用轴参数
Let's shuffle some values 让我们改变一些价值观
In [161]: A=np.arange(24)
In [162]: np.random.shuffle(A)
In [163]: A=A.reshape(4,6)
In [164]: A
Out[164]:
array([[ 1, 2, 4, 19, 12, 11],
[20, 5, 13, 21, 22, 3],
[10, 6, 16, 18, 17, 8],
[23, 9, 7, 0, 14, 15]])
Partition: 划分:
In [165]: A.partition(4,axis=1)
In [166]: A
Out[166]:
array([[ 2, 1, 4, 11, 12, 19],
[ 5, 3, 13, 20, 21, 22],
[ 6, 8, 10, 16, 17, 18],
[14, 7, 9, 0, 15, 23]])
the 4 smallest values of each row are first, the 2 largest last; 每行的4个最小值是第一个,最后2个; slice to get an array of the 2 largest:
切片获取2个最大的数组:
In [167]: A[:,-2:]
Out[167]:
array([[12, 19],
[21, 22],
[17, 18],
[15, 23]])
Sort is probably slower, but on a small array like this probably doesn't matter much. 排序可能更慢,但在这样的小阵列上可能并不重要。 Plus it lets you pick any N.
另外它可以让你挑选任何N.
In [169]: A.sort(axis=1)
In [170]: A
Out[170]:
array([[ 1, 2, 4, 11, 12, 19],
[ 3, 5, 13, 20, 21, 22],
[ 6, 8, 10, 16, 17, 18],
[ 0, 7, 9, 14, 15, 23]])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.