简体   繁体   中英

Ranking 2D Numpy array

I have a numpy array with 1000 rows and 2 columns as:

[[ 0.76        1.28947368]
 [ 0.7         0.97142857]
 [ 0.7         1.48571429]
 [ 0.68        1.11764706]
 [ 0.68        1.23529412]
 [ 0.68        1.41176471]
 [ 0.68        1.41176471]
 [ 0.68        1.44117647]
 [ 0.66        0.78787879]
 [ 0.66        1.03030303]
 [ 0.66        1.09090909]
 [ 0.66        1.15151515]
 [ 0.66        1.15151515]
 [ 0.66        1.21212121]
 [ 0.66        1.24242424]]

As evident, this array is sorted in descending order by column 0 and in ascending order by column 1. I want to assign rank to each row of this array such that duplicate rows (values in both column of two or more rows are equal) have same rank and insert rank as column 2.

Expected Output:

     [[0.76        1.28947368  1]
     [ 0.7         0.97142857  2]
     [ 0.7         1.48571429  3]
     [ 0.68        1.11764706  4]
     [ 0.68        1.23529412  5]
     [ 0.68        1.41176471  6]
     [ 0.68        1.41176471  6]  # as this row is duplicate of row above it
     [ 0.68        1.44117647  7]
     [ 0.66        0.78787879  8]
     [ 0.66        1.03030303  9]
     [ 0.66        1.09090909  10]
     [ 0.66        1.15151515  11]
     [ 0.66        1.15151515  11] # as this row is duplicate of row above it
     [ 0.66        1.21212121  12]
     [ 0.66        1.24242424  13]]

What is the most efficient way to achieve this??

For sorted array, like in the given sample, its easy -

rank = np.r_[True, (a[1:] != a[:-1]).any(1)].cumsum()
out = np.column_stack(( a, rank ))

As alternative to (a[1:] != a[:-1]).any(1) , we could use the following for performance :

(a[1:,0] != a[:-1,0]) | (a[1:,1] != a[:-1,1])

Sample step-by-step run

1) Input array :

In [70]: a
Out[70]: 
array([[ 0.76      ,  1.28947368],
       [ 0.68      ,  1.41176471],
       [ 0.68      ,  1.41176471],
       [ 0.68      ,  1.44117647],
       [ 0.66      ,  1.09090909],
       [ 0.66      ,  1.15151515],
       [ 0.66      ,  1.15151515],
       [ 0.66      ,  1.24242424]])

2) Get a mask of inequality between consecutive rows. The idea here is that since the array is sorted, so the duplicate rows would have identical elements across both columns. So, with the inequality across both columns, we would have a 1D mask, but one element less than the total number of rows in original array, as we used slicing with one element left off :

In [71]: a[1:] != a[:-1]
Out[71]: 
array([[ True,  True],
       [False, False],
       [False,  True],
       [ True,  True],
       [False,  True],
       [False, False],
       [False,  True]], dtype=bool)

In [72]: (a[1:] != a[:-1]).any(1)
Out[72]: array([ True, False,  True,  True,  True, False,  True], dtype=bool)

Now, to compensate for the one-less element and since we need to start the ranking from 1 and we intend to use cumumlative summation for this incremental ranking, let's append a 1 at the start and then use cumsum to give us the expected ranks :

In [75]: np.r_[True, (a[1:] != a[:-1]).any(1)]
Out[75]: array([ True,  True, False,  True,  True,  True, False,  True], dtype=bool)

In [76]: np.r_[True, (a[1:] != a[:-1]).any(1)].cumsum()
Out[76]: array([1, 2, 2, 3, 4, 5, 5, 6])

To visually verify, here's the stacked output :

In [77]: np.column_stack(( a, _ ))
Out[77]: 
array([[ 0.76      ,  1.28947368,  1.        ],
       [ 0.68      ,  1.41176471,  2.        ],
       [ 0.68      ,  1.41176471,  2.        ],
       [ 0.68      ,  1.44117647,  3.        ],
       [ 0.66      ,  1.09090909,  4.        ],
       [ 0.66      ,  1.15151515,  5.        ],
       [ 0.66      ,  1.15151515,  5.        ],
       [ 0.66      ,  1.24242424,  6.        ]])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM