[英]Create a new df in pandas from existing one?
我正在做一個關於電影數據的項目。
示例數據集如下所示:
列類型有 21 個唯一值。
我想創建一個新表\/數據框,以便該表包含每個用戶的每種類型的平均評分,例如
我使用以下代碼獲得了流派列表:
def split(sent):
return (sent.split())
new_genres=set()
for i in range(len(genres)):
a=split(genres[i])
for g in a:
new_genres.add(g)
new_genres
設置:
In [905]: df = pd.DataFrame({'userID':[1,2,3,3,2], 'id':[110, 147, 858, 1246, 1968], 'rating':[1.0, 4.5, 5.0, 5.0, 4.0], 'genres':['Drama Mystery Romance', 'Drama', 'Comedy Drama Romance', 'Drama', 'Drama Comedy Romance']}
...: )
In [906]: df
Out[906]:
userID id rating genres
0 1 110 1.0 Drama Mystery Romance
1 2 147 4.5 Drama
2 3 858 5.0 Comedy Drama Romance
3 3 1246 5.0 Drama
4 2 1968 4.0 Drama Comedy Romance
我們可以首先使用
assign<\/code><\/a>方法來獲取行中的每個
genre<\/code> ,如下所示:
>>> df = df.assign(genre=df['genres'].str.split(' ')).explode('genre')
>>> df
userId id rating genres genre
0 1 110 1.0 Drama Mystery Romance Drama
0 1 110 1.0 Drama Mystery Romance Mystery
0 1 110 1.0 Drama Mystery Romance Romance
1 1 147 4.5 Drama Drama
2 1 858 5.0 Comedy Drama Romance Comedy
2 1 858 5.0 Comedy Drama Romance Drama
2 1 858 5.0 Comedy Drama Romance Romance
3 1 1246 5.0 Drama Drama
4 1 1968 4.0 Drama Comedy Romance Drama
4 1 1968 4.0 Drama Comedy Romance Comedy
4 1 1968 4.0 Drama Comedy Romance Romance
5 270896 48780 5.0 Forein Forein
6 270896 49530 4.0 Action Thriller Scifi Action
6 270896 49530 4.0 Action Thriller Scifi Thriller
6 270896 49530 4.0 Action Thriller Scifi Scifi
7 270896 54001 4.0 Drama Drama
8 270896 54503 4.0 Action Forein Action
8 270896 54503 4.0 Action Forein Forein
9 270896 58559 5.0 Drama Drama
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.