简体   繁体   中英

pandas combine pivot table with DataFrame

I want to group my data set and enrich it with a formatted representation of the aggregated information.

This is my data set:

h = ['A', 'B', 'C']
d = [["a", "x", 1], ["a", "y", 2], ["b", "y", 4]] 
rows = pd.DataFrame(d, columns=h)

   A  B  C
0  a  x  1
1  a  y  2
2  b  y  4

I create a pivot table to generate 0 for missing values:

pivot = pd.pivot_table(rows,index=["A"], values=["C"], columns=["B"],fill_value=0)

   C   
B  x  y
A      
a  1  2
b  0  4

I groupy by A to remove dimension B :

wanted = rows.groupby("A").sum()

   C
A   
a  3
b  4

I try to add a column with the string representation of the aggregate details:

wanted["D"] = pivot["C"].applymap(lambda vs: reduce(lambda a,b: str(a)+"+"+str(b), vs.values))

AttributeError: ("'int' object has no attribute 'values'", u'occurred at index x')

It seems that I don't understand applymap.

What I want to achieve is:

   C  D
A   
a  3  1+2
b  4  0+4

You can first remove [] from parameters in pivot_table , so you remove Multiindex from columns:

pivot = pd.pivot_table(rows,index="A", values="C", columns="B",fill_value=0)

Then sum values by columns:

pivot['C'] = pivot.sum(axis=1)
print (pivot)
B  x  y  C
A         
a  1  2  3
b  0  4  4

Cast by astype int columns x and y to str and output to D :

pivot['D'] = pivot['x'].astype(str) + '+' + pivot['y'].astype(str)
print (pivot)
B  x  y  C    D
A              
a  1  2  3  1+2
b  0  4  4  0+4

Last remove column name by rename_axis (new in pandas 0.18.0 ) and drop unnecessary columns:

pivot = pivot.rename_axis(None, axis=1).drop(['x', 'y'], axis=1)
print (pivot)
   C    D
A        
a  3  1+2
b  4  0+4

But if want Multiindex in columns:

pivot = pd.pivot_table(rows,index=["A"], values=["C"], columns=["B"],fill_value=0)

pivot['E'] = pivot["C"].sum(1)
print (pivot)
   C     E
B  x  y   
A         
a  1  2  3
b  0  4  4

pivot["D"] = pivot[('C','x')].astype(str) + '+' + pivot[('C','y')].astype(str)
print (pivot)
   C     E    D
B  x  y        
A              
a  1  2  3  1+2
b  0  4  4  0+4

pivot = pivot.rename_axis((None,None), axis=1).drop('C', axis=1).rename(columns={'E':'C'})
pivot.columns = pivot.columns.droplevel(-1)
print (pivot)
   C    D
A        
a  3  1+2
b  4  0+4

EDIT:

Another solution with groupby and MultiIndex.droplevel :

pivot = pd.pivot_table(rows,index=["A"], values=["C"], columns=["B"],fill_value=0)

#remove top level of Multiindex in columns
pivot.columns = pivot.columns.droplevel(0)
print (pivot)
B  x  y
A      
a  1  2
b  0  4

wanted = rows.groupby("A").sum()
wanted['D'] = pivot['x'].astype(str) + '+' + pivot['y'].astype(str)
print (wanted)
   C    D
A        
a  3  1+2
b  4  0+4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM