I want to group my data set and enrich it with a formatted representation of the aggregated information.
This is my data set:
h = ['A', 'B', 'C']
d = [["a", "x", 1], ["a", "y", 2], ["b", "y", 4]]
rows = pd.DataFrame(d, columns=h)
A B C
0 a x 1
1 a y 2
2 b y 4
I create a pivot table to generate 0
for missing values:
pivot = pd.pivot_table(rows,index=["A"], values=["C"], columns=["B"],fill_value=0)
C
B x y
A
a 1 2
b 0 4
I groupy by A
to remove dimension B
:
wanted = rows.groupby("A").sum()
C
A
a 3
b 4
I try to add a column with the string representation of the aggregate details:
wanted["D"] = pivot["C"].applymap(lambda vs: reduce(lambda a,b: str(a)+"+"+str(b), vs.values))
AttributeError: ("'int' object has no attribute 'values'", u'occurred at index x')
It seems that I don't understand applymap.
What I want to achieve is:
C D
A
a 3 1+2
b 4 0+4
You can first remove []
from parameters in pivot_table
, so you remove Multiindex
from columns:
pivot = pd.pivot_table(rows,index="A", values="C", columns="B",fill_value=0)
Then sum
values by columns:
pivot['C'] = pivot.sum(axis=1)
print (pivot)
B x y C
A
a 1 2 3
b 0 4 4
Cast by astype
int
columns x
and y
to str
and output to D
:
pivot['D'] = pivot['x'].astype(str) + '+' + pivot['y'].astype(str)
print (pivot)
B x y C D
A
a 1 2 3 1+2
b 0 4 4 0+4
Last remove column name by rename_axis
(new in pandas
0.18.0
) and drop
unnecessary columns:
pivot = pivot.rename_axis(None, axis=1).drop(['x', 'y'], axis=1)
print (pivot)
C D
A
a 3 1+2
b 4 0+4
But if want Multiindex
in columns:
pivot = pd.pivot_table(rows,index=["A"], values=["C"], columns=["B"],fill_value=0)
pivot['E'] = pivot["C"].sum(1)
print (pivot)
C E
B x y
A
a 1 2 3
b 0 4 4
pivot["D"] = pivot[('C','x')].astype(str) + '+' + pivot[('C','y')].astype(str)
print (pivot)
C E D
B x y
A
a 1 2 3 1+2
b 0 4 4 0+4
pivot = pivot.rename_axis((None,None), axis=1).drop('C', axis=1).rename(columns={'E':'C'})
pivot.columns = pivot.columns.droplevel(-1)
print (pivot)
C D
A
a 3 1+2
b 4 0+4
EDIT:
Another solution with groupby
and MultiIndex.droplevel
:
pivot = pd.pivot_table(rows,index=["A"], values=["C"], columns=["B"],fill_value=0)
#remove top level of Multiindex in columns
pivot.columns = pivot.columns.droplevel(0)
print (pivot)
B x y
A
a 1 2
b 0 4
wanted = rows.groupby("A").sum()
wanted['D'] = pivot['x'].astype(str) + '+' + pivot['y'].astype(str)
print (wanted)
C D
A
a 3 1+2
b 4 0+4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.