简体   繁体   English

找到numpy数组中的n个最小项

[英]Find the n smallest items in a numpy array of arrays

There are plenty of questions on here where one wants to find the nth smallest element in a numpy array. 这里有很多问题,想要找到numpy数组中的第n个最小元素。 However, what if you have an array of arrays? 但是,如果你有一个数组数组怎么办? Like so: 像这样:

>>> print matrix
[[ 1.          0.28958002  0.09972488 ...,  0.46999924  0.64723113
   0.60217694]
 [ 0.28958002  1.          0.58005657 ...,  0.37668355  0.48852272
   0.3860152 ]
 [ 0.09972488  0.58005657  1.         ...,  0.13151364  0.29539992
   0.03686381]
 ..., 
 [ 0.46999924  0.37668355  0.13151364 ...,  1.          0.50250212
   0.73128971]
 [ 0.64723113  0.48852272  0.29539992 ...,  0.50250212  1.          0.71249226]
 [ 0.60217694  0.3860152   0.03686381 ...,  0.73128971  0.71249226  1.        ]]

How can I get the n smallest items out of this array of arrays? 如何从这个数组中获取n个最小的项目?

>>> print type(matrix)
<type 'numpy.ndarray'>

This is how I have been doing it to find the coordinates of the smallest item: 这就是我一直在寻找最小项目的坐标:

min_cordinates = []
for i in matrix:
    if numpy.any(numpy.where(i==numpy.amin(matrix))[0]):
        min_cordinates.append(int(numpy.where(i==numpy.amin(matrix))[0][0])+1)

Now I would like to find, for example, the 10 smallest items. 现在我想找到10个最小的项目。

If your array is not large, the accepted answer is fine. 如果您的阵列不大,接受的答案就可以了。 For large arrays, np.partition will accomplish this much more efficiently. 对于大型数组, np.partition将更有效地完成此任务。 Here's an example where the array has 10000 elements, and you want the 10 smallest values: 这是一个数组有10000个元素的例子,你想要10个最小的值:

In [56]: np.random.seed(123)

In [57]: a = 10*np.random.rand(100, 100)

Use np.partition to get the 10 smallest values: 使用np.partition获取10个最小值:

In [58]: np.partition(a, 10, axis=None)[:10]
Out[58]: 
array([ 0.00067838,  0.00081888,  0.00124711,  0.00120101,  0.00135942,
        0.00271129,  0.00297489,  0.00489126,  0.00556923,  0.00594738])

Note that the values are not in increasing order. 请注意,值不按递增顺序排列。 np.partition does not guarantee that the first 10 values will be sorted. np.partition不保证前10个值将被排序。 If you need them in increasing order, you can sort the selected values afterwards. 如果按升序需要它们,则可以在之后对所选值进行排序。 This will still be faster than sorting the entire array. 这仍然比排序整个阵列更快。

Here's the result using np.sort : 这是使用np.sort的结果:

In [59]: np.sort(a, axis=None)[:10]
Out[59]: 
array([ 0.00067838,  0.00081888,  0.00120101,  0.00124711,  0.00135942,
        0.00271129,  0.00297489,  0.00489126,  0.00556923,  0.00594738])

Now compare the timing: 现在比较时间:

In [60]: %timeit np.partition(a, 10, axis=None)[:10]
10000 loops, best of 3: 75.1 µs per loop

In [61]: %timeit np.sort(a, axis=None)[:10]
1000 loops, best of 3: 465 µs per loop

In this case, using np.partition is more than six times faster. 在这种情况下,使用np.partition速度要快六倍。

展平矩阵,排序然后选择前10个。

print(numpy.sort(matrix.flatten())[:10])

You can use the heapq.nsmallest function to return the list of the 10 smallest elements. 您可以使用heapq.nsmallest函数返回10个最小元素的列表。

In [84]: import heapq

In [85]: heapq.nsmallest(10, matrix.flatten())
Out[85]: 
[-1.7009047695355393,
 -1.4737632239971061,
 -1.1246243781838825,
 -0.7862983016935523,
 -0.5080863016259798,
 -0.43802651199959347,
 -0.22125698200832566,
 0.034938408281615596,
 0.13610084041121048,
 0.15876389111565958]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM