简体   繁体   中英

Replacing entries in a numpy array with their quantile index with python

I have a one-dimensional numpy array with numbers, and I want each number replaced with the index of the quantile it belongs to.

This is my code for quintile indices:

import numpy as np

def get_quintile_indices( a ):

    result = np.ones( a.shape[ 0 ] ) * 4

    quintiles = [
        np.percentile( a, 20 ),
        np.percentile( a, 40 ),
        np.percentile( a, 60 ),
        np.percentile( a, 80 )
    ]

    for q in quintiles:
        result -= np.less_equal( a, q ) * 1

    return result

a = np.array( [ 58, 54, 98, 76, 35, 13, 62, 18, 62, 97, 44, 43 ] )
print get_quintile_indices( a )

Output:

[ 2.  2.  4.  4.  0.  0.  3.  0.  3.  4.  1.  1.]

You see I start with an array initialized with the highest possible index and for every quintile cutpoint substract 1 from each entry that is less or equal than the quintile cutpoint. Is there a better way to do this? A build-in function that can be used to map numbers against a list of cutpoints?

First off, we can generate those quintiles in one go -

quintiles = np.percentile( a, [20,40,60,80] )    

For the final step to get the offsets, we can simply use np.searchsorted and this might be the built-in you were looking for, like so -

out = np.searchsorted(quintiles, a)

Alternatively, a direct translation of your loopy code to a vectorized version would be with broadcasting , like so -

# Use broadcasting to perform those comparisons in one go.
# Then, simply sum along the first axis and subtract from 4. 
out = 4 - (quintiles[:,None] >=  a).sum(0)

If quintiles is a list, we need to assign it as an array and then use broadcasting , like so -

out = 4 - (np.asarray(quintiles)[:,None] >=  a).sum(0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM