[英]Create a new df in pandas from existing one?
I am doing a project on movies data.我正在做一个关于电影数据的项目。
The sample dataset looks like :示例数据集如下所示:
The column genres have 21 unique values.列类型有 21 个唯一值。
I got the list of genres using the code below:我使用以下代码获得了流派列表:
def split(sent):
return (sent.split())
new_genres=set()
for i in range(len(genres)):
a=split(genres[i])
for g in a:
new_genres.add(g)
new_genres
Setup:设置:
In [905]: df = pd.DataFrame({'userID':[1,2,3,3,2], 'id':[110, 147, 858, 1246, 1968], 'rating':[1.0, 4.5, 5.0, 5.0, 4.0], 'genres':['Drama Mystery Romance', 'Drama', 'Comedy Drama Romance', 'Drama', 'Drama Comedy Romance']}
...: )
In [906]: df
Out[906]:
userID id rating genres
0 1 110 1.0 Drama Mystery Romance
1 2 147 4.5 Drama
2 3 858 5.0 Comedy Drama Romance
3 3 1246 5.0 Drama
4 2 1968 4.0 Drama Comedy Romance
We can start by using the assign<\/code><\/a> method to get each
genre<\/code> in rows like so :
我们可以首先使用
assign<\/code><\/a>方法来获取行中的每个
genre<\/code> ,如下所示:
>>> df = df.assign(genre=df['genres'].str.split(' ')).explode('genre')
>>> df
userId id rating genres genre
0 1 110 1.0 Drama Mystery Romance Drama
0 1 110 1.0 Drama Mystery Romance Mystery
0 1 110 1.0 Drama Mystery Romance Romance
1 1 147 4.5 Drama Drama
2 1 858 5.0 Comedy Drama Romance Comedy
2 1 858 5.0 Comedy Drama Romance Drama
2 1 858 5.0 Comedy Drama Romance Romance
3 1 1246 5.0 Drama Drama
4 1 1968 4.0 Drama Comedy Romance Drama
4 1 1968 4.0 Drama Comedy Romance Comedy
4 1 1968 4.0 Drama Comedy Romance Romance
5 270896 48780 5.0 Forein Forein
6 270896 49530 4.0 Action Thriller Scifi Action
6 270896 49530 4.0 Action Thriller Scifi Thriller
6 270896 49530 4.0 Action Thriller Scifi Scifi
7 270896 54001 4.0 Drama Drama
8 270896 54503 4.0 Action Forein Action
8 270896 54503 4.0 Action Forein Forein
9 270896 58559 5.0 Drama Drama
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.