[英]pandas get index for a row value
我正在使用以下数据框
like max_interest min_interest
basketball 4 2
football 2 0
soccer 4 2
softball 4 2
volleyball 4 2
swimming 2 0
cheerleading 4 2
baseball 4 2
我想按max_interest / min兴趣将其分组
group max_interest min_interest
4 basketball,soccer,softball,volleyball,cheerleading,baseball N/A
2 football,swimming basketball,soccre,softball,volleyball,cheerleading,baseball
0 N/A football,swimming
我试图通过使用groupby(max_interest)使其工作,但未能找到如何合并like列的方法
这本质上是在max_interest标题下将likes的行值合并为字符串,对于mininterest也是类似的。
可以通过编写iterateng的手编码逻辑并不断添加喜欢的东西来实现,但是想知道我是否可以使用pandas / np库编写它
帮助赞赏。
这是一个选择:
In [39]: def groupby(key):
....: result = data.groupby(key).agg({'like': lambda v: ','.join(v)})
....: result.index.name = 'group'
....: result.columns = [key]
....: return result
....:
In [40]: pd.concat((groupby(key) for key in ['max_interest', 'min_interest']), axis=1)
Out[40]:
max_interest min_interest
group
0 NaN football,swimming
2 football,swimming basketball,soccer,softball,volleyball,cheerlea...
4 basketball,soccer,softball,volleyball,cheerlea... NaN
首先拆分DataFrame
并根据兴趣级别连接适当的DataFrame
:
u = ({k: ','.join(n['like'])} for k, n in df.groupby('max_interest'))
v = ({k: ','.join(n['like'])} for k, n in df.groupby('min_interest'))
然后创建一个新的DataFrame
:
df1 = pd.DataFrame(list(u)+list(v), index=['max_interest', 'max_interest', 'min_interest', 'min_interest']
以所需的形式放置框架,使用groupby()。last()
adjustframe = df1.grouby(level=0).last().transpose()
输出:
max_interest min_interest
0 NaN foot,swim
2 foot,swim basket,soccer,soft,volley,cheer,base
4 basket,soccer,soft,volley,cheer,base NaN
设置索引名称:
adjustframe.index.name = 'group'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.