In a python pandas dataframe "df", I have the following three columns:
song_id | user_id | play_count
I have a rating table I invented based on play_count (how many times a user listened to a song):
play_count | rating
1-33 | 1
34-66 | 2
67-99 | 3
100-199 | 4
>200 | 5
I am trying to add a column "rating" to this table based on play count. For example, if play_count=2, the rating will be "1".
So it looks like this
song_id | user_id | play_count | rating
X232 | u8347 | 2 | 1
X987 | u3701 | 50 | 2
X271 | u9327 | 10 | 1
X523 | u1398 | 175 | 4
In excel I would do this with match/index, but I don't know how to do it in python/pandas.
Would it be a combination of an if/else loop and isin?
You need the endpoints of those ranges like you would need in Excel:
import numpy as np
bins = [1, 33, 66, 99, 199, np.inf]
Then you can use pd.cut to find the corresponding rating:
pd.cut(df['play_count'], bins=bins, include_lowest=True, labels=[1, 2, 3, 4, 5]).astype(int)
I added astype(int)
at the end because pd.cut returns a categorical series so you cannot do arithmetic calculations on it.
I think if you change your play_count
tables to use min/max values, like this:
playcount
:
min | max | rating
1 |33 | 1
34 |66 | 2
67 |99 | 3
100 |199 | 4
200 |np.inf | 5
of course you need to import numpy as np
Then you can do something like this:
df['rating'] = play_count[(df['play_count'] >= play_count['min']) & (df['play_count'] <= play_count['max'])].rating
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.