简体   繁体   中英

Create a boolean column in pandas datafame based on percentile values of another column

I have a dataframe with multiple columns. I want to create boolean column, flagging if the value belongs to 90th percentile and above.

My data frame also contains multiple zeros.

Example:

Name  Value
Val1  1000
Val2  910
Val3  800
Val4  700
Val5  600
Val6  500
Val7  400
Val8  300
Val9  200
Val10 100
Val11 0

Expected output

Name  Value 90thper
Val1  1000    1
Val2  910     1
Val3  800     0
Val4  700     0
Val5  600     0
Val6  500     0
Val7  400     0
Val8  300     0
Val9  200     0
Val10 100     0
Val11 0       0

You could use pd.Series.quantile to find the 90th percentile value and include all values above it.

val = df['Value'].quantile(.9, interpolation="lower") # val -> 910
df['90thper'] = df['Value'].ge(val).astype(int)

#     Name  Value  90thper
# 0   Val1   1000        1
# 1   Val2    910        1
# 2   Val3    800        0
# 3   Val4    700        0
# 4   Val5    600        0
# 5   Val6    500        0
# 6   Val7    400        0
# 7   Val8    300        0
# 8   Val9    200        0
# 9  Val10    100        0

Try quantile

df['new'] = df.Value.ge(df.Value.quantile(0.9)).astype(int)
df
     Name  Value  new
0    Val1   1000    1
1    Val2    910    1
2    Val3    800    0
3    Val4    700    0
4    Val5    600    0
5    Val6    500    0
6    Val7    400    0
7    Val8    300    0
8    Val9    200    0
9   Val10    100    0
10  Val11      0    0

Assume you dataframe is called df . You can do:

df['90thper'] = (df['Value'] >= df['Value'].quantile(0.90)).apply(lambda x: int(x))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM