I have a multiindexed dataframe on which I want to aggregate over some of the indices. If the aggregator function returns a float, things work with no problem. But I can't find how to use a function with more complex returns (eg, a pd.Series). Using a function that returns pd.Series gives me this error: Exception: Must produce aggregated value
error.
Here is an example dataframe:
df = pd.DataFrame({
'A': {
(1, 0): 85, (1, 1): 75,
(2, 0): 12, (2, 1): 15,
(3, 0): 2, (3, 1): 26,
},
'B': {
(1, 0): 86, (1, 1): 76,
(2, 0): 13, (2, 1): 17,
(3, 0): 19, (3, 1): 18,
}
}).stack()
df.index.rename(['idx', 'bar', 'label'], inplace=True)
The content of df
is:
idx bar label
1 0 A 85
B 86
1 A 75
B 76
2 0 A 12
B 13
1 A 15
B 17
3 0 A 2
B 19
1 A 26
B 18
dtype: int64
Let's define a simple aggregator that returns pd.Series:
def my_func(subframe):
subframe = subframe.unstack('label')
mean_A_plus_B = np.mean(subframe['B'] + subframe['A'])
mean_A_minus_B = np.mean(subframe['B'] - subframe['A'])
return pd.Series([mean_A_plus_B, mean_A_minus_B], index=['A+B', 'A-B'])
# return mean_A_plus_B ## <- this one works.
Applying the aggregator like the following an exception:
df.groupby('idx').agg(my_func)
.
.
.
py/pandas/core/groupby/generic.py in _aggregate_named(self, func, *args, **kwargs)
907 output = func(group, *args, **kwargs)
908 if isinstance(output, (Series, Index, np.ndarray)):
--> 909 raise Exception('Must produce aggregated value')
910 result[name] = self._try_cast(output, group)
Exception: Must produce aggregated value
What I had hoped to receive was:
A+B A-B
idx
1 161.0 1.0
2 28.5 1.5
3 32.5 4.5
dtype: float64
What is the right way of doing this?
Just replace .agg()
by .apply()
:
df.groupby('idx').apply(my_func).unstack(level=-1)
Output:
A+B A-B
idx
1 161.0 1.0
2 28.5 1.5
3 32.5 4.5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.