Python Pandas groupby multiple columns and append

Question

I'm close to achieve what I want thanks to Python Pandas Groupby/Append columns but still not quite there.

DF:

City	Plan	Problem 1	Problem 2	Problem 3
Genoa	Service 1	aaa	bbb	ccc
Genoa	Service 2	ddd	zzz	yyy
Genoa	Service 3	ggg	ppp	jjj
Venice	Service 2	lll	vvv
Venice	Service 3	eee	fff	mmm

Expected Output:

City	Problem 1	Problem 2	Problem 3	Problem 4	Problem 5	Problem 6	Problem 7	Problem 8	Problem 9
Genoa	aaa	bbb	ccc	ddd	zzz	yyy	ggg	ppp	jjj
Venice				lll	vvv		eee	fff	mmm

Basically I want to:

Group by City
Discard Plan Column (if possible)
Append all the other parameters (They still need to be always in order, so if a service is missing the cells would be empty.

After playing a while with unstack and cumcount from the linked solution, I'm still missing something to respect the order of the Plan column and fill with empty cells if a service is missing.

This is the code I'm using:

import pandas as pd

df = pd.read_csv('input.csv')

df1 = df.set_index('City').stack().reset_index(name='vals')
df1['g'] = 'Param' + df1.groupby('City').cumcount().add(1).astype(str)
df1 = df1.pivot(index='City', columns='g', values='vals')

df1.to_csv('output.csv')

In my tests I've removed the Plan column from the input, but the problem is that after ordering the parameters in the output, if, for example, a city has only Service 3 , they are still aligned under Service 1 .

Answer 1

This is a pivot problem, but you can also do this by stacking and unstacking:

s = df.set_index(['City', 'Plan']).stack().unstack([1, 2])
s.columns = 'Problem ' + pd.RangeIndex(1, s.shape[1]+1).astype(str)

print (s)

       Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Problem 7 Problem 8 Problem 9
City                                                                                            
Genoa        aaa       bbb       ccc       ddd       zzz       yyy       ggg       ppp       jjj
Venice       NaN       NaN       NaN       lll       vvv       NaN       eee       fff       mmm

Another way using melt :

s = df.melt(['City', 'Plan']).pivot('City', ['Plan', 'variable'], 'value')
s.columns = 'Problem ' + pd.RangeIndex(1, s.shape[1]+1).astype(str)

print (s)
       Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Problem 7 Problem 8 Problem 9
City                                                                                            
Genoa        aaa       ddd       ggg       bbb       zzz       ppp       ccc       yyy       jjj
Venice       NaN       lll       eee       NaN       vvv       fff       NaN       NaN       mmm

The ordering is a bit different, but the relative ordering between Services is preserved.

Python Pandas groupby multiple columns and append

Question

1 answers

solution1
4 ACCPTED 2020-12-18 10:12:41

Python Pandas groupby multiple columns and append

Question

1 answers

solution1 4 ACCPTED 2020-12-18 10:12:41

solution1
4 ACCPTED 2020-12-18 10:12:41