简体   繁体   中英

Pandas - aggregate multiple columns with pivot_table

I have a dataframe like this:

import pandas as pd
import numpy as np

df = pd.DataFrame({"ind0": list("QQQWWWW"), "ind1": list("RRRRSSS"), "vals": range(7), "cols": list("XXYXXYY")})    
print(df)

Output:

  ind0 ind1  vals cols
0    Q    R     0    X
1    Q    R     1    X
2    Q    R     2    Y
3    W    R     3    X
4    W    S     4    X
5    W    S     5    Y
6    W    S     6    Y

I want to aggregate the values while creating columns from col , so I thought of using pivot_table :

df_res = df.pivot_table(index=["ind0", "ind1"], columns="cols", values="vals", aggfunc=np.sum).fillna(0)    
print(df_res)

This gives me:

cols         X     Y
ind0 ind1           
Q    R     1.0   2.0
W    R     3.0   0.0
     S     4.0  11.0

However, I would rather get the sum independent of ind1 categories while keeping the information in this column. So, the desired output would be:

cols         X    Y
ind0 ind1          
Q    R       1.0  2.0
W    R,S     7.0  11.0

Is there a way to use pivot_table or pivot to this end or do I have to aggregate for ind1 in a second step? If the latter, how?

You could reset_index of df_res and groupby "ind0" and using agg , use different functions on columns: join ing unique values of "ind1" and sum ming "X" and "Y".

out = df_res.reset_index().groupby('ind0').agg({'ind1': lambda x: ', '.join(x.unique()), 'X':'sum', 'Y':'sum'})

Or if you have multiple columns that you need to do the same function on, you could also use a dict comprehension:

funcs = {'ind1': lambda x: ', '.join(x.unique()), **{k:'sum' for k in ('X','Y')}}
out = df_res.reset_index().groupby('ind0').agg(funcs)

Output:

cols  ind1    X     Y
ind0                 
Q        R  1.0   2.0
W     R, S  7.0  11.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM