简体   繁体   English

在成对比较 NxN numpy 数组中找到 N 个最小值?

[英]Find the N smallest values in a pair-wise comparison NxN numpy array?

I have a python NxN numpy pair-wise array (matrix) of double values.我有一个双值的 python NxN numpy 成对数组(矩阵)。 Each array element of eg, ( i , j ), is a measurement between the i and j item.例如,( i , j ) 的每个数组元素是ij项之间的度量。 The diagonal, where i == j , is 1 as it's a pairwise measurement of itself.对角线,其中i == j是 1,因为它是对自身的成对测量。 This also means that the 2D NxN numpy array can be represented in matrix triangular form (one half of the numpy array identical to the other half across the diagonal).这也意味着 2D NxN numpy 阵列可以以矩阵三角形形式表示(numpy 阵列的一半与对角线上的另一半相同)。

A truncated representation:截断的表示:

[[1.         0.11428571 0.04615385 ... 0.13888889 0.07954545 0.05494505]
 [0.11428571 1.         0.09836066 ... 0.06578947 0.09302326 0.07954545]
 [0.04615385 0.09836066 1.         ... 0.07843137 0.09821429 0.11711712]
 ...
 [0.13888889 0.06578947 0.07843137 ... 1.         0.34313725 0.31428571]
 [0.07954545 0.09302326 0.09821429 ... 0.34313725 1.         0.64130435]
 [0.05494505 0.07954545 0.11711712 ... 0.31428571 0.64130435 1.        ]]

I want to get out the smallest N values whilst not including the pairwise values twice, as would be the case due to the pair-wise duplication eg, (5,6) == (6,5), and I do not want to include any of the identical diagonal values of 1 where i == j .我想得到最小的 N 值,但不包括两次成对值,因为成对重复,例如 (5,6) == (6,5),我不想包括任何相同的对角线值 1 其中i == j

I understand that numpy has the partition method and I've seen plenty of examples for a flat array, but I'm struggling to find anything straightforward for a pair-wise comparison matrix.我知道 numpy 具有分区方法,并且我已经看到了很多平面数组的示例,但是我很难找到任何简单的成对比较矩阵。

EDIT #1 Based on my first response below I implemented:编辑#1根据我在下面的第一个回复,我实现了:

seventyPercentInt: int = round((populationSizeInt/100)*70)

upperTriangleArray = dataArray[np.triu_indices(len(dataArray),1)]
seventyPercentArray = upperTriangleArray[np.argpartition(upperTriangleArray,seventyPercentInt)][0:seventyPercentInt]

print(len(np.unique(seventyPercentArray)))

The upperTriangleArray numpy array has 1133265 elements to pick the lowest k from. upperTriangleArray numpy 数组有 1133265 个元素可以从中选择最低的k In this case k is represented by seventyPercentInt , which is around 1054 values.在这种情况下, kSeventyPercentInt表示,大约是 1054 个值。 However, when I apply np.argpartition only the value of 0 is returned.但是,当我应用np.argpartition时,只返回0的值。

The flat array upperTriangleArray is reduced to a shape (1133265,).平面数组upperTriangleArray缩减为形状 (1133265,)。

SOLUTION解决方案

As per the first reply below (the accepted answer), my code that worked:根据下面的第一个回复(接受的答案),我的代码有效:

upperTriangleArray = dataArray[np.triu_indices(len(dataArray),1)]

seventyPercentInt: int = round((len(upperTriangleArray)/100)*70)

seventyPercentArray = upperTriangleArray[np.argpartition(upperTriangleArray,seventyPercentInt)][0:seventyPercentInt]

I ran into some slight trouble (my own making), with the seventyPercentInt .我遇到了一些小麻烦(我自己制造的),使用了SeventyPercentInt Rather than taking 70% of the pairwise elements, I took 70% of the elements to be compared.我没有取 70% 的成对元素,而是取了 70% 的元素进行比较。 Two very different values.两种截然不同的价值观。

You can use np.triu_indices to keep only the values of the upper triangle.您可以使用np.triu_indices仅保留上三角形的值。

Then you can use np.argpartition as in the example below.然后您可以使用np.argpartition ,如下例所示。

import numpy as np

A = np.array([[1.0, 0.1, 0.2, 0.3],
            [0.1, 1.0, 0.4, 0.5],
            [0.2, 0.3, 1.0, 0.6],
            [0.3, 0.5, 0.4, 1.0]])

A_upper_triangle = A[np.triu_indices(len(A), 1)]

print(A_upper_triangle)
# return [0.1 0.2 0.3 0.3 0.5 0.4]

k=2

print(A_upper_triangle[np.argpartition(A_upper_triangle, k)][0:k])
#return [0.1 0.2]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM