简体   繁体   English

将函数应用于分组的DataFrame后,Pandas sort_index给出奇怪的结果

[英]Pandas sort_index gives strange result after applying function to grouped DataFrame

Basic setup: 基本设置:

I have a DataFrame with a MultiIndex on both the rows and the columns. 我在行和列上都有一个具有DataFrameMultiIndex The second level of the column index has float s for values. 列索引的第二级具有值的float

I want to perform a groupby operation (grouping by the first level of the row index). 我想执行groupby操作(按行索引的第一级分组)。 The operation will add a few columns (also with float s as their labels) to each group and then return the group. 该操作将为每个组添加几列(也将float用作其标签),然后返回该组。

When I get the result back from my groupby operation, I can't seem to get the columns to sort properly. 当我从groupby操作返回结果时,似乎无法正确地对列进行排序。

Working example. 工作示例。 First, set things up: 首先,进行设置:

import pandas as pd
import numpy as np

np.random.seed(0)

col_level_1 = ['red', 'blue']
col_level_2 = [1., 2., 3., 4.]

row_level_1 = ['a', 'b']
row_level_2 = ['one', 'two']

col_idx = pd.MultiIndex.from_product([col_level_1, col_level_2], names=['color', 'numeral'])
row_idx = pd.MultiIndex.from_product([row_level_1, row_level_2], names=['letter', 'number'])

df = pd.DataFrame(np.random.randn(len(row_idx), len(col_idx)), index=row_idx, columns=col_idx)

Gives this DataFrame in df : df给出此DataFrame 在此处输入图片说明

Then define my group operation and apply it: 然后定义我的小组操作并应用它:

def mygrpfun(group):
    for f in [1.5, 2.5, 3.5]:
        group[('red', f)] = 'hello'
        group[('blue', f)] = 'world'
    return group

result = df.groupby(level='letter').apply(mygrpfun).sort_index(axis=1)

Displaying result gives: 显示result给出: 在此处输入图片说明

What's going on here? 这里发生了什么? Why doesn't the 2nd level of the column index display in ascending order? 为什么第二级列索引没有按升序显示?

EDIT: In terms of context: 编辑:就上下文而言:

pd.__version__
Out[28]:
'0.14.0'
In [29]:

np.__version__
Out[29]:
'1.8.1'

Any help much appreciated. 任何帮助,不胜感激。

The returned result looks as expected. 返回的结果与预期的一样。 You added columns. 您添加了列。 There was no guarantee that order imposed on those columns. 不能保证对这些列施加了顺序。

You could just reimpose ordering: 您可以重新订购:

result = result[sorted(result.columns)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM