[英]How to split multiple values in columns and groupby said values in pandas?
我正在尝试通过将具有多个值的列分开来创建一个新的 DataFrame,以便每一行只有一个值。
我已经尝试了一些 groupby 操作,但我似乎无法分离值或按用户组织它
item title feature
0 1 ToyStory(1995) Adventure|Animation|Children|Comedy|Fantasy
1 2 Jumanji (1995) Adventure|Children|Fantasy
2 3 Grumpier Old Men (1995) Comedy|Romance
3 4 Waiting to Exhale (1995) Comedy|Drama|Romance
4 5 Father of the Bride Part II (1995) Comedy
item feature
0 1 Adventure
1 1 Animation
2 1 Children
3 1 Comedy
4 1 Fantasy
你需要str.split
,然后是stack
:
r = df.set_index('item').feature.str.split('|', expand=True).stack()
r.index = r.index.get_level_values(0)
r.reset_index(name='feature')
item feature
0 1 Adventure
1 1 Animation
2 1 Children
3 1 Comedy
4 1 Fantasy
5 2 Adventure
6 2 Children
7 2 Fantasy
8 3 Comedy
9 3 Romance
10 4 Comedy
11 4 Drama
12 4 Romance
13 5 Comedy
另一种选择是使用np.repeat
:
u = df.set_index('item').feature.str.split('|')
pd.DataFrame({
'item': np.repeat(u.index, u.str.len()),
'feature': [y for x in u for y in x]
})
item feature
0 1 Adventure
1 1 Animation
2 1 Children
3 1 Comedy
4 1 Fantasy
5 2 Adventure
6 2 Children
7 2 Fantasy
8 3 Comedy
9 3 Romance
10 4 Comedy
11 4 Drama
12 4 Romance
13 5 Comedy
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.