简体   繁体   English

排名2D Numpy数组

[英]Ranking 2D Numpy array

I have a numpy array with 1000 rows and 2 columns as: 我有一个具有1000行和2列的numpy数组,如:

[[ 0.76        1.28947368]
 [ 0.7         0.97142857]
 [ 0.7         1.48571429]
 [ 0.68        1.11764706]
 [ 0.68        1.23529412]
 [ 0.68        1.41176471]
 [ 0.68        1.41176471]
 [ 0.68        1.44117647]
 [ 0.66        0.78787879]
 [ 0.66        1.03030303]
 [ 0.66        1.09090909]
 [ 0.66        1.15151515]
 [ 0.66        1.15151515]
 [ 0.66        1.21212121]
 [ 0.66        1.24242424]]

As evident, this array is sorted in descending order by column 0 and in ascending order by column 1. I want to assign rank to each row of this array such that duplicate rows (values in both column of two or more rows are equal) have same rank and insert rank as column 2. 显然,此数组按列0的降序排列,并按列1的升序排列。我想为该数组的每一行分配等级,以使重复的行(两行或更多行的两列中的值相等)具有与第2列具有相同的等级并插入等级。

Expected Output: 预期产量:

     [[0.76        1.28947368  1]
     [ 0.7         0.97142857  2]
     [ 0.7         1.48571429  3]
     [ 0.68        1.11764706  4]
     [ 0.68        1.23529412  5]
     [ 0.68        1.41176471  6]
     [ 0.68        1.41176471  6]  # as this row is duplicate of row above it
     [ 0.68        1.44117647  7]
     [ 0.66        0.78787879  8]
     [ 0.66        1.03030303  9]
     [ 0.66        1.09090909  10]
     [ 0.66        1.15151515  11]
     [ 0.66        1.15151515  11] # as this row is duplicate of row above it
     [ 0.66        1.21212121  12]
     [ 0.66        1.24242424  13]]

What is the most efficient way to achieve this?? 实现此目的的最有效方法是什么?

For sorted array, like in the given sample, its easy - 对于排序数组,就像给定样本中一样,它很容易-

rank = np.r_[True, (a[1:] != a[:-1]).any(1)].cumsum()
out = np.column_stack(( a, rank ))

As alternative to (a[1:] != a[:-1]).any(1) , we could use the following for performance : 作为(a[1:] != a[:-1]).any(1)替代方法,我们可以使用以下代码来提高性能:

(a[1:,0] != a[:-1,0]) | (a[1:,1] != a[:-1,1])

Sample step-by-step run 样品分步运行

1) Input array : 1)输入数组:

In [70]: a
Out[70]: 
array([[ 0.76      ,  1.28947368],
       [ 0.68      ,  1.41176471],
       [ 0.68      ,  1.41176471],
       [ 0.68      ,  1.44117647],
       [ 0.66      ,  1.09090909],
       [ 0.66      ,  1.15151515],
       [ 0.66      ,  1.15151515],
       [ 0.66      ,  1.24242424]])

2) Get a mask of inequality between consecutive rows. 2)获得连续行之间不等式的掩码。 The idea here is that since the array is sorted, so the duplicate rows would have identical elements across both columns. 这里的想法是,由于数组已排序,因此重复的行在两列中将具有相同的元素。 So, with the inequality across both columns, we would have a 1D mask, but one element less than the total number of rows in original array, as we used slicing with one element left off : 因此,由于两列之间的不等式,我们将具有一维蒙版,但比原始数组中的行总数少一个元素,因为我们使用了切片,但不使用一个元素:

In [71]: a[1:] != a[:-1]
Out[71]: 
array([[ True,  True],
       [False, False],
       [False,  True],
       [ True,  True],
       [False,  True],
       [False, False],
       [False,  True]], dtype=bool)

In [72]: (a[1:] != a[:-1]).any(1)
Out[72]: array([ True, False,  True,  True,  True, False,  True], dtype=bool)

Now, to compensate for the one-less element and since we need to start the ranking from 1 and we intend to use cumumlative summation for this incremental ranking, let's append a 1 at the start and then use cumsum to give us the expected ranks : 现在,为了补偿少一元素,并且由于我们需要从1开始排名,并且打算对该累加排名使用累积总和,因此让我们在开始处添加1 ,然后使用cumsum来给我们期望的排名:

In [75]: np.r_[True, (a[1:] != a[:-1]).any(1)]
Out[75]: array([ True,  True, False,  True,  True,  True, False,  True], dtype=bool)

In [76]: np.r_[True, (a[1:] != a[:-1]).any(1)].cumsum()
Out[76]: array([1, 2, 2, 3, 4, 5, 5, 6])

To visually verify, here's the stacked output : 为了直观地验证,这是堆叠的输出:

In [77]: np.column_stack(( a, _ ))
Out[77]: 
array([[ 0.76      ,  1.28947368,  1.        ],
       [ 0.68      ,  1.41176471,  2.        ],
       [ 0.68      ,  1.41176471,  2.        ],
       [ 0.68      ,  1.44117647,  3.        ],
       [ 0.66      ,  1.09090909,  4.        ],
       [ 0.66      ,  1.15151515,  5.        ],
       [ 0.66      ,  1.15151515,  5.        ],
       [ 0.66      ,  1.24242424,  6.        ]])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM