简体   繁体   中英

Group data in pivot Pandas

hello I made a table with the help of a pandas, then I created a pivot with two indexes., I would like to group this data like that, but one index was the header of another. Below I will show what is happening in the tables and what I would like the result Table:

Name Lang Skill Corp
Michael java 2 Google
Piter C++ 3 Facebook
Cristiano python 5 Google
Michael java 1 Facebook
Piter C++ 2 Google
Cristiano python 3 Facebook
Michael java 4 Google
Piter C++ 5 Facebook
Cristiano python 1 Google
Michael python 2 Facebook

I used:

pivot = pd.pivot_table(df, values="Skill", index=["Corp", "Name"], columns = "Lang", aggfunc="sum")

and I have pivot:

Corp Name C++ java python
Facebook Cristiano nan nan 3
Facebook Michael nan 1 2
Facebook Piter 8 nan nan
Google Cristiano nan nan 6
Google Michael nan 6 nan
Google Piter 2 nan nan

the result I would like:

Name C++ java python
Facebook sum_fb sum_fb sum_fb
Cristiano nan nan 3
Michael nan 1 2
Piter 8 nan nan
Google sum_google sum_google sum_google
Cristiano nan nan 6
Michael nan 6 nan
Piter 2 nan nan

Thank You in advance

You can aggregate sum by Corp level, which is first by GroupBy.sum , append index to MultiIndex with same values:

df1 = (pivot.groupby(level=0).sum()
            .assign(Name = lambda x: x.index)
            .set_index('Name', append=True))

Or use level=[0,0] for MultiIndex, only necessary set names by DataFrame.rename_axis :

df1 = pivot.groupby(level=[0,0]).sum().rename_axis(['Corp','Name'])
print (df1)         
Lang               C++  java  python
Corp     Name                       
Facebook Facebook  8.0   1.0     5.0
Google   Google    2.0   6.0     6.0

Then is appended pivot DataFrame by concat , but is necessary sorting for correct ordering by first level of MultiIndex with DataFrame.sort_index , remove first level Corp by DataFrame.droplevel and last convert Name to column with remove columns name Lang by DataFrame.rename_axis :

df = (pd.concat([df1, pivot])
        .sort_index(level=0, sort_remaining=False)
        .droplevel(0)
        .reset_index()
        .rename_axis(None, axis=1))
print (df)
        Name  C++  java  python
0   Facebook  8.0   1.0     5.0
1  Cristiano  NaN   NaN     3.0
2    Michael  NaN   1.0     2.0
3      Piter  8.0   NaN     NaN
4     Google  2.0   6.0     6.0
5  Cristiano  NaN   NaN     6.0
6    Michael  NaN   6.0     NaN
7      Piter  2.0   NaN     NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM