简体   繁体   中英

Add quantile number as a new column in pandas

I have a dataframe with three columns

| A | B | C |

I calculated the quantiles:

df.quantile(.25)
df.quantile(.75)

I want to add a new column Q classifying using 'small', 'medium', 'large' according to a simple rule. If the values are smaller than 1 quartile then is small; if it's bigger than 3 quartile then large and everything in between is medium.

I have tried using qcut but it only receives a 1-d input.

Thanks

pd.qcut is your friend.

pd.qcut(s, q=[0, .25, .75, 1], labels=['small', 'medium', 'large'])

MWE

print(s)
0     1
1     1
2     2
3     3
4     4
5     2
6     4
7     6
8     4
9     6
10    5
11    4
12    6
13    7
14    3
15    2
16    1
17    1
18    2
dtype: int64

print (pd.qcut(s, q=[0, .25, .75, 1], labels=['small', 'medium', 'large']))
0      small
1      small
2      small
3     medium
4     medium
5      small
6     medium
7      large
8     medium
9      large
10     large
11    medium
12     large
13     large
14    medium
15     small
16     small
17     small
18     small
dtype: category
Categories (3, object): [small < medium < large]

For DataFrames, repeat this for each column with apply :

df.apply(pd.qcut, q=[0, .25, .75, 1], labels=['small', 'medium', 'large'], axis=0)

Setup

np.random.seed([3, 1415])
df = pd.DataFrame(
    np.random.randint(10, size=(10, 3)),
    columns=list('ABC')
)

pandas.DataFrame.mask

Pandas only and intuitive

is_small = df < df.quantile(.25)
is_large = df > df.quantile(.75)
is_medium = ~(is_small | is_large)

df.mask(is_small, 'small').mask(is_large, 'large').mask(is_medium, 'medium')

        A       B       C
0   small   small  medium
1  medium   large  medium
2   small   large   large
3  medium   small   small
4   small  medium   large
5   large  medium   small
6  medium  medium  medium
7  medium   large  medium
8  medium  medium  medium
9   large  medium   large

Nested numpy.where

is_small = df < df.quantile(.25)
is_large = df > df.quantile(.75)

pd.DataFrame(
    np.where(is_small, 'small', np.where(is_large, 'large', 'medium')),
    df.index, df.columns
)

        A       B       C
0   small   small  medium
1  medium   large  medium
2   small   large   large
3  medium   small   small
4   small  medium   large
5   large  medium   small
6  medium  medium  medium
7  medium   large  medium
8  medium  medium  medium
9   large  medium   large

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM