简体   繁体   English

熊猫在多索引级别上的groupby:group_keys

[英]pandas groupby on multiindex levels: group_keys

I have a DataFrame whose columns are MultiIndex. 我有一个DataFrame,其列为MultiIndex。 I want to groupby one level of the columns and use apply to perform a transformations. 我想按列的一个级别分组并使用apply执行转换。

Goal: I want that the DataFrame passed to the function using apply does not have the keys of groupby in the index. 目标:我希望使用apply传递给函数的DataFrame在索引中没有 groupby的键。

From the docs it looks like this is what group_keys is doing, but it seems to have no effect: 文档看来,这是group_keys所做的,但似乎没有效果:

import numpy as np
import pandas as pd

data = {'A': pd.DataFrame(np.random.randn(100, 5)),
        'B': pd.DataFrame(np.random.randn(100, 5)),
        'C': pd.DataFrame(np.random.randn(100, 5))}

data = pd.concat(data, axis=1, names=['feat_1', 'feat_2'])

def foo(df):
    print(df.columns)
    return df.sum(1)

My attempt: 我的尝试:

result = data.groupby(level=['feat_1'], axis=1, group_keys=False).apply(foo)

This is what is printed on screen: 这是在屏幕上打印的内容:

MultiIndex(levels=[['A', 'B', 'C'], [0, 1, 2, 3, 4]],
           labels=[[0, 0, 0, 0, 0], [0, 1, 2, 3, 4]],
           names=['feat_1', 'feat_2'])
MultiIndex(levels=[['A', 'B', 'C'], [0, 1, 2, 3, 4]],
           labels=[[1, 1, 1, 1, 1], [0, 1, 2, 3, 4]],
           names=['feat_1', 'feat_2'])
MultiIndex(levels=[['A', 'B', 'C'], [0, 1, 2, 3, 4]],
           labels=[[2, 2, 2, 2, 2], [0, 1, 2, 3, 4]],
           names=['feat_1', 'feat_2'])

Desired output of print: I would like the function foo to receive a dataframe with only feat_2 as columns, given I am using group_keys=False 所需的打印输出:如果我使用group_keys=False ,我希望函数foo接收仅以feat_2作为列的数据group_keys=False

[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]

Am I missing something from the documentation? 我在文档中缺少什么吗? Or how can I achieve what I want (possibly without modifying the function foo)? 或者如何实现我想要的(可能不修改foo函数)?

Note: I am using pandas 0.20.3 on Python 3 注意:我在Python 3上使用pandas 0.20.3

Rather than grouping, how about: 与其分组,不如:

for feat1 in data.columns.levels[0]:
    print(list(data.columns.levels[1]))

[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM