简体   繁体   中英

Create a score column in pandas whose value depends on the percentile of another column

I have the following dataframe:

User_ID Game_ID votes
1         11    1040
1         11    nan
1         22    1101
1         11    540
1         33    nan
2         33    nan
2         33    290
2         33    nan

Based on the percentile of the values in the column votes , a new column needs to be created, per the following rules:

If the “votes” value is >= 75th percentile assign a score of 2

If >=25th percentile assign a score of 1

If <25th percentile assign a score of 0.

Use pd.qcut :

df['score'] = pd.qcut(df['votes'].astype(float), [0, 0.25, 0.75, 1.0]).cat.codes
print(df)

Output ( nan corresponds to -1 ):

0    1
1   -1
2    2
3    1
4   -1
5   -1
6    0
7   -1
dtype: int8

You can get the percentiles by calling describe and use list comprehension:

percentiles = df.votes.describe()
df['scores'] = [2 if x >= percentiles['75%'] else (0 if x < percentiles['25%'] else 1) for x in df.votes]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM