简体   繁体   中英

When using groupby with multiple index columns or index

I have a dataframe like below :

idx=pd.MultiIndex.from_arrays([[1,1,1,2],[1,1,2,2]])
df=pd.DataFrame(columns=idx,index=[1,2,3]).fillna(1)

Now I want to find the sum base on two levels of the columns , first come into my mind is groupby and sum

df.sum(level=[0,1],axis=1)
   1     2
   1  2  2
1  2  1  1
2  2  1  1
3  2  1  1

df.groupby(level=[0, 1], axis=1).sum() #same output as above

df.groupby(df.columns.labels, axis=1).sum()#same output as above

Since we groupby all columns , in order to reduce the manual input work , I am trying to use df.columns replace the level=[0,1] , But here show me the wired output, which converted multiple index to tuple (which is make sense since multiple index is another layout of list of tulple)

df.groupby(df.columns,axis=1).sum()
   (1, 1)  (1, 2)  (2, 2)
1       2       1       1
2       2       1       1
3       2       1       1

Also when I am doing no aggregated functions like transform , the output is backing to normal

df.groupby(df.columns,axis=1).transform('sum')
   1        2
   1  1  2  2
1  2  2  1  1
2  2  2  1  1
3  2  2  1  1

Q: Why it happen . if groupby change the multiple index to tuple , should it change transform call as well?

I think this has to do with transform is coded to work on columns from a dataframe. Even though you are grouping on rows, transform is still only passing columns to the function.

def f(x):
    print(x)

df.groupby(df.columns,axis=1).transform(f)

Output:

1  1    1
   1    1
Name: 1, dtype: int64
1  1    1
   1    1
Name: 2, dtype: int64
1  1    1
   1    1
Name: 3, dtype: int64
   1   
   1  1
1  1  1
2  1  1
3  1  1
1  2    1
Name: 1, dtype: int64
1  2    1
Name: 2, dtype: int64
1  2    1
Name: 3, dtype: int64
2  2    1
Name: 1, dtype: int64
2  2    1
Name: 2, dtype: int64
2  2    1
Name: 3, dtype: int64

The name of each series that is passed to f, the custom function, is the index, but only a single column is getting passed. Not all columns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM