简体   繁体   中英

Sequentially ranking values in 2D array with tie-breaker in python

I am trying to sequentially rank values from a multidimensional numpy array in Python with a tie-breaking option, resulting in an array of the same shape containing the sequential ranks of the original data. I would need the same occurrence of a value in the array to be given the same rank. In the example provided below, the lowest value in the array would be assigned the first rank. Since '-1' is the lowest value in the array and there are 2 occurrences of '-1' in the array they would each be assigned the '0' rank. The next lowest value in the array is '0.00', there are 4 occurrences of '0.00' in the array so they would each be assigned the '1' rank. Any duplicate values in the array should be given the same rank and these ranks should be built sequentially from 0-max for lowest to highest values in the array.

So if my input array is this:

a
np.array([[ -1.0, 0.17, 0.89, 0.00],

         [ 0.12,  0.57, 0.42, 0.00],

         [ 0.38,  0.57, 0.00, 0.031],

         [ 0.036, 0.00, 0.021, -1.0]])

I want my output array to be this:

array([[ 0,  6,  10,  1],
         
       [ 5,  9,  8,  1],
         
       [ 7,  9,  1,  3],
         
       [ 4,  1,  2,  0]])

I've tried argsort and scipy.stats.rankdata and both get a part of what I need.

Argsort option: ranks sequentially but does not have an option for tie-breaking (at least not that I have found)

a.ravel().argsort().argsort().reshape(a.shape)


array([[  0, 10, 15, 2],
         
       [  9, 13, 12, 3],
         
       [ 11, 14,  4, 7],
         
       [  8,  5,  6, 1]])

rankdata option: takes care of the tie-breaker but now I am missing the sequential ranking


np.reshape((rankdata(a, method='min') - 1), a.shape)

array([[  0, 10, 15, 2]
       [  9, 13, 12, 2]
       [ 11, 13,  2, 7],
       
       [  8,  2,  6, 0]])

Am I missing something obvious? Does anyone have a solution? The arrays I would need to run the code on are dimensioned 1500X3600 so much larger than the example above.

Well, I found an answer using scipy.stats.rankdata by assigning the method to dense instead of min as below..

np.reshape((rankdata(a, method='dense') - 1), a.shape)
array([[ 0,  6,  10,  1],
         
       [ 5,  9,  8,  1],
         
       [ 7,  9,  1,  3],
         
       [ 4,  1,  2,  0]])

I would still like to know if there is a solution using Argsort without having to go through scipy.

I am trying to sequentially rank values from a multidimensional numpy array in Python with a tie-breaking option, resulting in an array of the same shape containing the sequential ranks of the original data. I would need the same occurrence of a value in the array to be given the same rank. In the example provided below, the lowest value in the array would be assigned the first rank. Since '-1' is the lowest value in the array and there are 2 occurrences of '-1' in the array they would each be assigned the '0' rank. The next lowest value in the array is '0.00', there are 4 occurrences of '0.00' in the array so they would each be assigned the '1' rank. Any duplicate values in the array should be given the same rank and these ranks should be built sequentially from 0-max for lowest to highest values in the array.

So if my input array is this:

a
np.array([[ -1.0, 0.17, 0.89, 0.00],

         [ 0.12,  0.57, 0.42, 0.00],

         [ 0.38,  0.57, 0.00, 0.031],

         [ 0.036, 0.00, 0.021, -1.0]])

I want my output array to be this:

array([[ 0,  6,  10,  1],
         
       [ 5,  9,  8,  1],
         
       [ 7,  9,  1,  3],
         
       [ 4,  1,  2,  0]])

I've tried argsort and scipy.stats.rankdata and both get a part of what I need.

Argsort option: ranks sequentially but does not have an option for tie-breaking (at least not that I have found)

a.ravel().argsort().argsort().reshape(a.shape)


array([[  0, 10, 15, 2],
         
       [  9, 13, 12, 3],
         
       [ 11, 14,  4, 7],
         
       [  8,  5,  6, 1]])

rankdata option: takes care of the tie-breaker but now I am missing the sequential ranking


np.reshape((rankdata(a, method='min') - 1), a.shape)

array([[  0, 10, 15, 2]
       [  9, 13, 12, 2]
       [ 11, 13,  2, 7],
       
       [  8,  2,  6, 0]])

Am I missing something obvious? Does anyone have a solution? The arrays I would need to run the code on are dimensioned 1500X3600 so much larger than the example above.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM