I have a dataframe like below :
idx=pd.MultiIndex.from_arrays([[1,1,1,2],[1,1,2,2]])
df=pd.DataFrame(columns=idx,index=[1,2,3]).fillna(1)
Now I want to find the sum base on two levels of the columns , first come into my mind is groupby
and sum
df.sum(level=[0,1],axis=1)
1 2
1 2 2
1 2 1 1
2 2 1 1
3 2 1 1
df.groupby(level=[0, 1], axis=1).sum() #same output as above
df.groupby(df.columns.labels, axis=1).sum()#same output as above
Since we groupby
all columns , in order to reduce the manual input work , I am trying to use df.columns
replace the level=[0,1]
, But here show me the wired output, which converted multiple index to tuple (which is make sense since multiple index is another layout of list of tulple)
df.groupby(df.columns,axis=1).sum()
(1, 1) (1, 2) (2, 2)
1 2 1 1
2 2 1 1
3 2 1 1
Also when I am doing no aggregated functions like transform , the output is backing to normal
df.groupby(df.columns,axis=1).transform('sum')
1 2
1 1 2 2
1 2 2 1 1
2 2 2 1 1
3 2 2 1 1
Q: Why it happen . if groupby
change the multiple index to tuple , should it change transform
call as well?
I think this has to do with transform
is coded to work on columns from a dataframe. Even though you are grouping on rows, transform is still only passing columns to the function.
def f(x):
print(x)
df.groupby(df.columns,axis=1).transform(f)
Output:
1 1 1
1 1
Name: 1, dtype: int64
1 1 1
1 1
Name: 2, dtype: int64
1 1 1
1 1
Name: 3, dtype: int64
1
1 1
1 1 1
2 1 1
3 1 1
1 2 1
Name: 1, dtype: int64
1 2 1
Name: 2, dtype: int64
1 2 1
Name: 3, dtype: int64
2 2 1
Name: 1, dtype: int64
2 2 1
Name: 2, dtype: int64
2 2 1
Name: 3, dtype: int64
The name of each series that is passed to f, the custom function, is the index, but only a single column is getting passed. Not all columns.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.