简体   繁体   中英

Python Pandas groupby multiple columns and append

I'm close to achieve what I want thanks to Python Pandas Groupby/Append columns but still not quite there.

DF:

City Plan Problem 1 Problem 2 Problem 3
Genoa Service 1 aaa bbb ccc
Genoa Service 2 ddd zzz yyy
Genoa Service 3 ggg ppp jjj
Venice Service 2 lll vvv
Venice Service 3 eee fff mmm

Expected Output:

City Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Problem 7 Problem 8 Problem 9
Genoa aaa bbb ccc ddd zzz yyy ggg ppp jjj
Venice lll vvv eee fff mmm

Basically I want to:

  1. Group by City
  2. Discard Plan Column (if possible)
  3. Append all the other parameters (They still need to be always in order, so if a service is missing the cells would be empty.

After playing a while with unstack and cumcount from the linked solution, I'm still missing something to respect the order of the Plan column and fill with empty cells if a service is missing.

This is the code I'm using:

import pandas as pd

df = pd.read_csv('input.csv')

df1 = df.set_index('City').stack().reset_index(name='vals')
df1['g'] = 'Param' + df1.groupby('City').cumcount().add(1).astype(str)
df1 = df1.pivot(index='City', columns='g', values='vals')

df1.to_csv('output.csv')

In my tests I've removed the Plan column from the input, but the problem is that after ordering the parameters in the output, if, for example, a city has only Service 3 , they are still aligned under Service 1 .

This is a pivot problem, but you can also do this by stacking and unstacking:

s = df.set_index(['City', 'Plan']).stack().unstack([1, 2])
s.columns = 'Problem ' + pd.RangeIndex(1, s.shape[1]+1).astype(str)

print (s)

       Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Problem 7 Problem 8 Problem 9
City                                                                                            
Genoa        aaa       bbb       ccc       ddd       zzz       yyy       ggg       ppp       jjj
Venice       NaN       NaN       NaN       lll       vvv       NaN       eee       fff       mmm

Another way using melt :

s = df.melt(['City', 'Plan']).pivot('City', ['Plan', 'variable'], 'value')
s.columns = 'Problem ' + pd.RangeIndex(1, s.shape[1]+1).astype(str)

print (s)
       Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Problem 7 Problem 8 Problem 9
City                                                                                            
Genoa        aaa       ddd       ggg       bbb       zzz       ppp       ccc       yyy       jjj
Venice       NaN       lll       eee       NaN       vvv       fff       NaN       NaN       mmm

The ordering is a bit different, but the relative ordering between Services is preserved.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM