简体   繁体   中英

python pandas dataframe index match

In a python pandas dataframe "df", I have the following three columns:

song_id | user_id | play_count

I have a rating table I invented based on play_count (how many times a user listened to a song):

play_count | rating
1-33       | 1
34-66      | 2
67-99      | 3   
100-199    | 4
>200       | 5

I am trying to add a column "rating" to this table based on play count. For example, if play_count=2, the rating will be "1".

So it looks like this

song_id | user_id | play_count | rating
X232    | u8347   | 2          | 1
X987    | u3701   | 50         | 2
X271    | u9327   | 10         | 1
X523    | u1398   | 175        | 4

In excel I would do this with match/index, but I don't know how to do it in python/pandas.

Would it be a combination of an if/else loop and isin?

You need the endpoints of those ranges like you would need in Excel:

import numpy as np
bins = [1, 33, 66, 99, 199, np.inf]

Then you can use pd.cut to find the corresponding rating:

pd.cut(df['play_count'], bins=bins, include_lowest=True, labels=[1, 2, 3, 4, 5]).astype(int)

I added astype(int) at the end because pd.cut returns a categorical series so you cannot do arithmetic calculations on it.

I think if you change your play_count tables to use min/max values, like this:

playcount :

min | max | rating
1   |33   | 1
34  |66   | 2
67  |99   | 3   
100 |199  | 4
200 |np.inf  | 5

of course you need to import numpy as np

Then you can do something like this:

df['rating'] = play_count[(df['play_count'] >= play_count['min']) & (df['play_count'] <= play_count['max'])].rating

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM