Python Pandas 按多列和 append 分组

Question

I'm close to achieve what I want thanks to Python Pandas Groupby/Append columns but still not quite there.多亏了Python Pandas Groupby/Append 列，我已经接近实现我想要的目标，但仍然不完全存在。

DF:东风：

City城市	Plan计划	Problem 1问题 1	Problem 2问题 2	Problem 3问题 3
Genoa热那亚	Service 1服务1	aaa啊啊啊	bbb bbb	ccc ccc
Genoa热那亚	Service 2服务 2	ddd ddd	zzz zzz	yyy年年
Genoa热那亚	Service 3服务 3	ggg ggg	ppp公私合营	jjj jjj
Venice威尼斯	Service 2服务 2	lll lll	vvv vvv
Venice威尼斯	Service 3服务 3	eee eee	fff fff	mmm嗯

Expected Output:预期 Output：

City城市	Problem 1问题 1	Problem 2问题 2	Problem 3问题 3	Problem 4问题 4	Problem 5问题 5	Problem 6问题 6	Problem 7问题 7	Problem 8问题 8	Problem 9问题 9
Genoa热那亚	aaa啊啊啊	bbb bbb	ccc ccc	ddd ddd	zzz zzz	yyy年年	ggg ggg	ppp公私合营	jjj jjj
Venice威尼斯				lll lll	vvv vvv		eee eee	fff fff	mmm嗯

Basically I want to:基本上我想：

Group by City按城市分组
Discard Plan Column (if possible)丢弃计划列（如果可能）
Append all the other parameters (They still need to be always in order, so if a service is missing the cells would be empty. Append 所有其他参数（它们仍然需要始终按顺序排列，因此如果缺少服务，单元格将为空。

After playing a while with unstack and cumcount from the linked solution, I'm still missing something to respect the order of the Plan column and fill with empty cells if a service is missing.在从链接的解决方案中使用 unstack 和 cumcount 玩了一段时间之后，我仍然缺少一些东西来尊重 Plan 列的顺序，如果缺少服务则用空单元格填充。

This is the code I'm using:这是我正在使用的代码：

import pandas as pd

df = pd.read_csv('input.csv')

df1 = df.set_index('City').stack().reset_index(name='vals')
df1['g'] = 'Param' + df1.groupby('City').cumcount().add(1).astype(str)
df1 = df1.pivot(index='City', columns='g', values='vals')

df1.to_csv('output.csv')

In my tests I've removed the Plan column from the input, but the problem is that after ordering the parameters in the output, if, for example, a city has only Service 3 , they are still aligned under Service 1 .在我的测试中，我从输入中删除了Plan列，但问题是在对 output 中的参数进行排序后，例如，如果一个城市只有Service 3 ，它们仍然在Service 1下对齐。

Answer 1

This is a pivot problem, but you can also do this by stacking and unstacking:这是一个 pivot 问题，但您也可以通过堆叠和取消堆叠来做到这一点：

s = df.set_index(['City', 'Plan']).stack().unstack([1, 2])
s.columns = 'Problem ' + pd.RangeIndex(1, s.shape[1]+1).astype(str)

print (s)

       Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Problem 7 Problem 8 Problem 9
City                                                                                            
Genoa        aaa       bbb       ccc       ddd       zzz       yyy       ggg       ppp       jjj
Venice       NaN       NaN       NaN       lll       vvv       NaN       eee       fff       mmm

Another way using melt :另一种使用melt的方法：

s = df.melt(['City', 'Plan']).pivot('City', ['Plan', 'variable'], 'value')
s.columns = 'Problem ' + pd.RangeIndex(1, s.shape[1]+1).astype(str)

print (s)
       Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Problem 7 Problem 8 Problem 9
City                                                                                            
Genoa        aaa       ddd       ggg       bbb       zzz       ppp       ccc       yyy       jjj
Venice       NaN       lll       eee       NaN       vvv       fff       NaN       NaN       mmm

The ordering is a bit different, but the relative ordering between Services is preserved.排序有点不同，但服务之间的相对排序被保留了。

Python Pandas 按多列和 append 分组

问题描述

1 个解决方案

解决方案1
4 已采纳 2020-12-18 10:12:41

Python Pandas 按多列和 append 分组

问题描述

1 个解决方案

解决方案1 4 已采纳 2020-12-18 10:12:41

解决方案1
4 已采纳 2020-12-18 10:12:41