Pandas quantile function not returning the correct number of given quantiles

Question

I have a dataframe with over 2,000 records that has multiple columns with various balances. Based on the balance amount I want to assign it to a bucket.

Trying to split each balance column into a quantile and have the following buckets 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9 Concretely, translating the balances into the following buckets: top 10%, top 20%, top 30%, etc...

If I'm understanding correctly, so long as there are more than 10 records it should bucket each record in a percentile based on linear interpolation.

So I run the following:

score_quantiles =  df.quantile(q=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9])
score_quantiles.to_dict()



# Arguments (x = value, p = field (i.e bal 1, bal2, bal3) , d = score_quantiles)

def dlpScore(x,p,d):
    if pd.isnull(x) == True:
        return 0 
    elif int(x) == 0:
        return 0
    elif x <= d[p][0.1]:
        return 1
    elif x <= d[p][0.2]:
        return 2
    elif x <= d[p][0.3]: 
        return 3
    elif x <= d[p][0.4]: 
        return 4
    elif x <= d[p][0.5]: 
        return 5
    elif x <= d[p][0.6]: 
        return 6
    elif x <= d[p][0.7]: 
        return 7
    elif x <= d[p][0.8]: 
        return 8
    elif x <= d[p][0.9]: 
        return 9
    else:
        return 10



df['SCORE_BAL1'] = df['bal1'].apply(dlpScore, args=('bal1',score_quantiles,))

Problem is, on some columns it gives me all the buckets, on others it only gives me a few:

Is there a way to ensure it creates all the buckets? I'm probably missing something here.

Answer 1

If you want to make sure that you are getting similar distributions among 'buckets' you might want to try the pandas qcut function. The full documentation is here .

To use it in your code and get deciles for instance you could just do

n_buckets=10
df['quantile'] = pd.qcut(df['target_column'], q=n_buckets)

And if you want to apply a specific label you can just do something like this

n_buckets=10
df['quantile'] = pd.qcut(df['target_column'], q=n_buckets, labels=range(1,n_buckets+1))

PS: Just beware that for this latter case if qcut is not able to generate the desired number of quantiles (eg, because there are not enough values to create a quantile), you will get an exception for passing more labels than there are quantiles.

Pandas quantile function not returning the correct number of given quantiles

Question

1 answers

solution1
2 ACCPTED 2021-03-30 19:24:43

Pandas quantile function not returning the correct number of given quantiles

Question

1 answers

solution1 2 ACCPTED 2021-03-30 19:24:43

solution1
2 ACCPTED 2021-03-30 19:24:43