Sequentially ranking values in 2D array with tie-breaker in python

Question

I am trying to sequentially rank values from a multidimensional numpy array in Python with a tie-breaking option, resulting in an array of the same shape containing the sequential ranks of the original data. I would need the same occurrence of a value in the array to be given the same rank. In the example provided below, the lowest value in the array would be assigned the first rank. Since '-1' is the lowest value in the array and there are 2 occurrences of '-1' in the array they would each be assigned the '0' rank. The next lowest value in the array is '0.00', there are 4 occurrences of '0.00' in the array so they would each be assigned the '1' rank. Any duplicate values in the array should be given the same rank and these ranks should be built sequentially from 0-max for lowest to highest values in the array.

So if my input array is this:

a
np.array([[ -1.0, 0.17, 0.89, 0.00], 
         [ 0.12,  0.57, 0.42, 0.00], 
         [ 0.38,  0.57, 0.00, 0.031], 
         [ 0.036, 0.00, 0.021, -1.0]])

I want my output array to be this:

array([[ 0,  6,  10,  1],          
       [ 5,  9,  8,  1],          
       [ 7,  9,  1,  3],          
       [ 4,  1,  2,  0]])

I've tried argsort and scipy.stats.rankdata and both get a part of what I need.

Argsort option: ranks sequentially but does not have an option for tie-breaking (at least not that I have found)

a.ravel().argsort().argsort().reshape(a.shape) 

array([[  0, 10, 15, 2],          
       [  9, 13, 12, 3],          
       [ 11, 14,  4, 7],          
       [  8,  5,  6, 1]])

rankdata option: takes care of the tie-breaker but now I am missing the sequential ranking

 np.reshape((rankdata(a, method='min') - 1), a.shape)

array([[  0, 10, 15, 2]
       [  9, 13, 12, 2]
       [ 11, 13,  2, 7],        
       [  8,  2,  6, 0]])

Am I missing something obvious? Does anyone have a solution? The arrays I would need to run the code on are dimensioned 1500X3600 so much larger than the example above.

Answer 1

Well, I found an answer using scipy.stats.rankdata by assigning the method to dense instead of min as below..

np.reshape((rankdata(a, method='dense') - 1), a.shape)
array([[ 0,  6,  10,  1],          
       [ 5,  9,  8,  1],          
       [ 7,  9,  1,  3],          
       [ 4,  1,  2,  0]])

I would still like to know if there is a solution using Argsort without having to go through scipy.

Answer 2

I am trying to sequentially rank values from a multidimensional numpy array in Python with a tie-breaking option, resulting in an array of the same shape containing the sequential ranks of the original data. I would need the same occurrence of a value in the array to be given the same rank. In the example provided below, the lowest value in the array would be assigned the first rank. Since '-1' is the lowest value in the array and there are 2 occurrences of '-1' in the array they would each be assigned the '0' rank. The next lowest value in the array is '0.00', there are 4 occurrences of '0.00' in the array so they would each be assigned the '1' rank. Any duplicate values in the array should be given the same rank and these ranks should be built sequentially from 0-max for lowest to highest values in the array.

So if my input array is this:

a
np.array([[ -1.0, 0.17, 0.89, 0.00], 
         [ 0.12,  0.57, 0.42, 0.00], 
         [ 0.38,  0.57, 0.00, 0.031], 
         [ 0.036, 0.00, 0.021, -1.0]])

I want my output array to be this:

array([[ 0,  6,  10,  1],          
       [ 5,  9,  8,  1],          
       [ 7,  9,  1,  3],          
       [ 4,  1,  2,  0]])

I've tried argsort and scipy.stats.rankdata and both get a part of what I need.

Argsort option: ranks sequentially but does not have an option for tie-breaking (at least not that I have found)

a.ravel().argsort().argsort().reshape(a.shape) 

array([[  0, 10, 15, 2],          
       [  9, 13, 12, 3],          
       [ 11, 14,  4, 7],          
       [  8,  5,  6, 1]])

rankdata option: takes care of the tie-breaker but now I am missing the sequential ranking

 np.reshape((rankdata(a, method='min') - 1), a.shape)

array([[  0, 10, 15, 2]
       [  9, 13, 12, 2]
       [ 11, 13,  2, 7],        
       [  8,  2,  6, 0]])

Am I missing something obvious? Does anyone have a solution? The arrays I would need to run the code on are dimensioned 1500X3600 so much larger than the example above.

Sequentially ranking values in 2D array with tie-breaker in python

Question

1 answers

solution1
0 ACCPTED 2020-01-19 17:32:36

solution2
0 2020-01-19 18:02:04

Sequentially ranking values in 2D array with tie-breaker in python

Question

1 answers

solution1 0 ACCPTED 2020-01-19 17:32:36

solution2 0 2020-01-19 18:02:04

solution1
0 ACCPTED 2020-01-19 17:32:36

solution2
0 2020-01-19 18:02:04