简体   繁体   English

Pandas groupby 与多索引列

[英]Pandas groupby with multiindex columns

My goal, the way I expected to achieve it, and what happens instead我的目标,我期望实现它的方式,以及会发生什么

I am trying to do a groupby on a DataFrame which has multiindex columns using a Series (without multiindex) as an input to group by.我正在尝试在 DataFrame 上进行 groupby,该 DataFrame 具有使用系列(没有多索引)作为分组依据的输入的多索引列。 Specifically, given the below DataFrame具体来说,给定以下 DataFrame

>>> df
            X        Y      
            A  B  C  A  B  C
2020-01-01  9  1  2  1  6  5
2020-01-02  5  7  8  0  6  9
2020-01-03  6  3  4  8  6  1
2020-01-06  0  0  9  0  5  1
2020-01-07  8  7  4  8  3  1

and the Series representing the groups和代表组的系列

>>> groups
A    D
B    D
C    E
dtype: object

I try to run the following我尝试运行以下

>>> df.groupby(groups, axis=1, level=1).sum()

and expect to get并期望得到

             X      Y   
             D  E   D  E
2020-01-01  10  2   7  5
2020-01-02  12  8   6  9
2020-01-03   9  4  14  1
2020-01-06   0  9   5  1
2020-01-07  15  4  11  1

Instead however I get the following error:但是,我收到以下错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/zak/anaconda3/envs/lib/python3.8/site-packages/pandas/core/frame.py", line 6717, in groupby
    return DataFrameGroupBy(
  File "/home/zak/anaconda3/envs/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 560, in __init__
    grouper, exclusions, obj = get_grouper(
  File "/home/zak/anaconda3/envs/lib/python3.8/site-packages/pandas/core/groupby/grouper.py", line 828, in get_grouper
    Grouping(
  File "/home/zak/anaconda3/envs/lib/python3.8/site-packages/pandas/core/groupby/grouper.py", line 485, in __init__
    ) = index._get_grouper_for_level(self.grouper, level)
  File "/home/zak/anaconda3/envs/lib/python3.8/site-packages/pandas/core/indexes/multi.py", line 1487, in _get_grouper_for_level
    grouper = level_values.map(mapper)
  File "/home/zak/anaconda3/envs/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 5098, in map
    new_values = super()._map_values(mapper, na_action=na_action)
  File "/home/zak/anaconda3/envs/lib/python3.8/site-packages/pandas/core/base.py", line 937, in _map_values
    new_values = map_f(values, mapper)
  File "pandas/_libs/lib.pyx", line 2467, in pandas._libs.lib.map_infer
TypeError: 'numpy.ndarray' object is not callable

I'm using Python 3.8.8 and Pandas version 1.2.3.我正在使用 Python 3.8.8 和 Pandas 版本 1.2.3。

A sub-optimal solution次优解决方案

One way I found to achieve the above is with the following code, but I'm specifically wondering whether there is a cleaner way to do so.我发现实现上述目的的一种方法是使用以下代码,但我特别想知道是否有更清洁的方法可以做到这一点。 If not, why not?如果不是,为什么不呢? To me the above attempt would be the expected behaviour of the groupby method, but it seems I'm misunderstanding the logic behind it.对我来说,上述尝试将是 groupby 方法的预期行为,但似乎我误解了它背后的逻辑。

>>> df, groups = df.align(groups, axis=1, level=1)
>>> df.groupby(groups, axis=1).apply(lambda x: x.sum(axis=1, level=0)).swaplevel(axis=1).sort_index(axis=1)
             X      Y   
             D  E   D  E
2020-01-01  10  2   7  5
2020-01-02  12  8   6  9
2020-01-03   9  4  14  1
2020-01-06   0  9   5  1
2020-01-07  15  4  11  1

You can use rename by second level of MultiIndex and then aggregate by both levels:您可以使用MultiIndex的第二级rename ,然后按两个级别聚合:

df = df.rename(columns=groups, level=1).sum(axis=1, level=[0,1])

#working like
#df = df.rename(columns=groups, level=1).groupby(axis=1, level=[0,1]).sum()
print (df)
             X      Y   
             D  E   D  E
2020-01-01  10  2   7  5
2020-01-02  12  8   6  9
2020-01-03   9  4  14  1
2020-01-06   0  9   5  1
2020-01-07  15  4  11  1

Your solution should be changed by lambda function, but output is different:您的解决方案应由 lambda function 更改,但 output 不同:

df = df.groupby(lambda x: groups[x], axis=1, level=1).sum()
print (df)
             D   E
2020-01-01  17   7
2020-01-02  18  17
2020-01-03  23   5
2020-01-06   5  10
2020-01-07  26   5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM