![](/img/trans.png)
[英]How do I add a column(average_user_rating) to the dataframe that gives me the average rating for the userId in that row?
[英]How do I place NaN when computing the average rating for each movie in a DataFrame?
我正在使用 MovieLens 數據集,基本上有 2 個文件,一個包含電影的 .csv 文件和另一個包含 n 個用戶對特定電影的評級的 .csv 文件。
我做了以下操作以獲得 DF 中每部電影的平均評分。
ratings_data.groupby('movieId').rating.mean()
但是,使用該代碼,我獲得了 9724 部電影,而在主 DataFrame 中,我獲得了 9742 部電影。
我認為有些電影根本沒有評級,但是由於我想將評級添加到主電影數據集中,我如何將 NaN 放在沒有評級的字段上?!
使用Series.reindex
通過唯一的movieId
形成另一列,對於相同的順序是添加Series.sort_values
:
movies_data = pd.read_csv('ml-latest-small/movies.csv')
ratings_data = pd.read_csv('ml-latest-small/ratings.csv')
mov = movies_data['movieId'].sort_values().drop_duplicates()
df = ratings_data.groupby('movieId').rating.mean().reindex(mov).reset_index()
print (df)
movieId rating
0 1 3.920930
1 2 3.431818
2 3 3.259615
3 4 2.357143
4 5 3.071429
... ...
9737 193581 4.000000
9738 193583 3.500000
9739 193585 3.500000
9740 193587 3.500000
9741 193609 4.000000
[9742 rows x 2 columns]
df1 = df[df['rating'].isna()]
print (df1)
movieId rating
816 1076 NaN
2211 2939 NaN
2499 3338 NaN
2587 3456 NaN
3118 4194 NaN
4037 5721 NaN
4506 6668 NaN
4598 6849 NaN
4704 7020 NaN
5020 7792 NaN
5293 8765 NaN
5421 25855 NaN
5452 26085 NaN
5749 30892 NaN
5824 32160 NaN
5837 32371 NaN
5957 34482 NaN
7565 85565 NaN
編輯:
如果需要為movie_data
幀添加新列,請使用帶有左連接的DataFrame.merge
:
movies_data = pd.read_csv('ml-latest-small/movies.csv')
ratings_data = pd.read_csv('ml-latest-small/ratings.csv')
df = ratings_data.groupby('movieId', as_index=False).rating.mean()
print (df)
movieId rating
0 1 3.920930
1 2 3.431818
2 3 3.259615
3 4 2.357143
4 5 3.071429
... ...
9719 193581 4.000000
9720 193583 3.500000
9721 193585 3.500000
9722 193587 3.500000
9723 193609 4.000000
[9724 rows x 2 columns]
df = movies_data.merge(df, on='movieId', how='left')
print (df)
movieId title \
0 1 Toy Story (1995)
1 2 Jumanji (1995)
2 3 Grumpier Old Men (1995)
3 4 Waiting to Exhale (1995)
4 5 Father of the Bride Part II (1995)
... ...
9737 193581 Black Butler: Book of the Atlantic (2017)
9738 193583 No Game No Life: Zero (2017)
9739 193585 Flint (2017)
9740 193587 Bungo Stray Dogs: Dead Apple (2018)
9741 193609 Andrew Dice Clay: Dice Rules (1991)
genres rating
0 Adventure|Animation|Children|Comedy|Fantasy 3.920930
1 Adventure|Children|Fantasy 3.431818
2 Comedy|Romance 3.259615
3 Comedy|Drama|Romance 2.357143
4 Comedy 3.071429
... ...
9737 Action|Animation|Comedy|Fantasy 4.000000
9738 Animation|Comedy|Fantasy 3.500000
9739 Drama 3.500000
9740 Action|Animation 3.500000
9741 Comedy 4.000000
[9742 rows x 4 columns]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.