[英]Python Pandas groupby multiple columns and append
I'm close to achieve what I want thanks to Python Pandas Groupby/Append columns but still not quite there.多亏了Python Pandas Groupby/Append 列,我已经接近实现我想要的目标,但仍然不完全存在。
DF:东风:
City城市 | Plan计划 | Problem 1问题 1 | Problem 2问题 2 | Problem 3问题 3 |
---|---|---|---|---|
Genoa热那亚 | Service 1服务1 | aaa啊啊啊 | bbb bbb | ccc ccc |
Genoa热那亚 | Service 2服务 2 | ddd ddd | zzz zzz | yyy年年 |
Genoa热那亚 | Service 3服务 3 | ggg ggg | ppp公私合营 | jjj jjj |
Venice威尼斯 | Service 2服务 2 | lll lll | vvv vvv | |
Venice威尼斯 | Service 3服务 3 | eee eee | fff fff | mmm嗯 |
Expected Output:预期 Output:
City城市 | Problem 1问题 1 | Problem 2问题 2 | Problem 3问题 3 | Problem 4问题 4 | Problem 5问题 5 | Problem 6问题 6 | Problem 7问题 7 | Problem 8问题 8 | Problem 9问题 9 |
---|---|---|---|---|---|---|---|---|---|
Genoa热那亚 | aaa啊啊啊 | bbb bbb | ccc ccc | ddd ddd | zzz zzz | yyy年年 | ggg ggg | ppp公私合营 | jjj jjj |
Venice威尼斯 | lll lll | vvv vvv | eee eee | fff fff | mmm嗯 |
Basically I want to:基本上我想:
After playing a while with unstack and cumcount from the linked solution, I'm still missing something to respect the order of the Plan column and fill with empty cells if a service is missing.在从链接的解决方案中使用 unstack 和 cumcount 玩了一段时间之后,我仍然缺少一些东西来尊重 Plan 列的顺序,如果缺少服务则用空单元格填充。
This is the code I'm using:这是我正在使用的代码:
import pandas as pd
df = pd.read_csv('input.csv')
df1 = df.set_index('City').stack().reset_index(name='vals')
df1['g'] = 'Param' + df1.groupby('City').cumcount().add(1).astype(str)
df1 = df1.pivot(index='City', columns='g', values='vals')
df1.to_csv('output.csv')
In my tests I've removed the Plan column from the input, but the problem is that after ordering the parameters in the output, if, for example, a city has only Service 3 , they are still aligned under Service 1 .在我的测试中,我从输入中删除了Plan列,但问题是在对 output 中的参数进行排序后,例如,如果一个城市只有Service 3 ,它们仍然在Service 1下对齐。
This is a pivot problem, but you can also do this by stacking and unstacking:这是一个 pivot 问题,但您也可以通过堆叠和取消堆叠来做到这一点:
s = df.set_index(['City', 'Plan']).stack().unstack([1, 2])
s.columns = 'Problem ' + pd.RangeIndex(1, s.shape[1]+1).astype(str)
print (s)
Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Problem 7 Problem 8 Problem 9
City
Genoa aaa bbb ccc ddd zzz yyy ggg ppp jjj
Venice NaN NaN NaN lll vvv NaN eee fff mmm
Another way using melt
:另一种使用melt
的方法:
s = df.melt(['City', 'Plan']).pivot('City', ['Plan', 'variable'], 'value')
s.columns = 'Problem ' + pd.RangeIndex(1, s.shape[1]+1).astype(str)
print (s)
Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Problem 7 Problem 8 Problem 9
City
Genoa aaa ddd ggg bbb zzz ppp ccc yyy jjj
Venice NaN lll eee NaN vvv fff NaN NaN mmm
The ordering is a bit different, but the relative ordering between Services is preserved.排序有点不同,但服务之间的相对排序被保留了。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.