say I have a df:
data=[('a', 1), ('a', 1),('b', 1),('a', 3),('b', 2),('c', 1),('a', 2),('b', 3),('a', 2)]
df=df=pd.DataFrame(data, columns=['project', 'duration'])
# Then I made an aggregation:
df_agg=df.groupby('project').agg({'duration': ['median', 'mean']}).reset_index()
Out[11]:
project duration
median mean
0 a 2 1.8
1 b 2 2.0
2 c 1 1.0
In [12]: df_agg.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
(project, ) 3 non-null object
(duration, median) 3 non-null int64
(duration, mean) 3 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 152.0+ bytes
However, the df_agg
is not like an ordinary DataFrame, because the columns look like a tuple (duration, median)
, so that I can't get the columns conveniently with df[['median', 'mean']]
My question is how can I change the df_agg
to an ordinary DataFrame, with the columns flattened?
The df_agg
dataframe has a MultiIndex for its columns. Only this has to be flattened.
A trivial way is to convert it to a list and join
each element:
df_agg.columns = ['_'.join(col) for col in df_agg.columns]
it gives:
project_ duration_median duration_mean
0 a 2 1.8
1 b 2 2.0
2 c 1 1.0
If you want you can then rename the columns to have nicer names
you could perform the aggregation on the selected column for the groupby:
df.groupby('project')['duration'].agg(['median', 'mean']).add_prefix('duration_').reset_index()
output:
project duration_median duration_mean
0 a 2 1.8
1 b 2 2.0
2 c 1 1.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.