I am trying to pivot pandas DataFrame using several aggregate functions, some of which are lambda. There has to be a distinct name for each column in order to have aggregations by several lambda functions. I tried a few ideas I found online but none worked. This is the minimal example:
df = pd.DataFrame({'col1': [1, 1, 2, 3], 'col2': [4, 4, 5, 6], 'col3': [7, 10, 8, 9]})
pivoted_df = df.pivot_table(index = ['col1', 'col2'], values = 'col3', aggfunc=[('lam1', lambda x: np.percentile(x, 50)), ('lam2', np.percentile(x, 75)]).reset_index()
The error is
AttributeError: 'SeriesGroupBy' object has no attribute 'lam1'
I tried with dictionary
, it also results in error. Can someone help? Thanks!
Name the functions explicitly:
def lam1(x):
return np.percentile(x, 50)
def lam2(x):
return np.percentile(x, 75)
pivoted_df = df.pivot_table(index = ['col1', 'col2'], values = 'col3',
aggfunc=[lam1, lam2]).reset_index()
Your aggregation series will then be appropriately named:
print(pivoted_df)
col1 col2 lam1 lam2
0 1 4 8.5 9.25
1 2 5 8.0 8.00
2 3 6 9.0 9.00
The docs for pd.pivot_table
explain why:
aggfunc : function, list of functions, dict, default numpy.mean
If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names ( inferred from the function objects themselves ) If dict is passed, the key is column to aggregate and value is function or list of functions
I suggest use here DataFrameGroupBy.agg
:
f1 = lambda x: np.percentile(x, 50)
f2 = lambda x: np.percentile(x, 75)
pivoted_df = (df.groupby(['col1', 'col2'])['col3']
.agg([('lam1', f1), ('lam2', f2)])
.reset_index())
print (pivoted_df)
col1 col2 lam1 lam2
0 1 4 8.5 9.25
1 2 5 8.0 8.00
2 3 6 9.0 9.00
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.