连接多个 Pandas groupby 输出

Question

I would like to make multiple .groupby() operations on different subsets of a given dataset and bind them all together.我想对给定数据集的不同子集进行多个.groupby()操作并将它们绑定在一起。 For example:例如：

import pandas as pd
df = pd.DataFrame({"ID":[1,1,2,2,2,3],"Subset":[1,1,2,2,2,3],"Value":[5,7,4,1,7,8]})
print(df)
   ID  Subset  Value
0   1       1      5
1   1       1      7
2   2       2      4
3   2       2      1
4   2       2      7
5   3       1      9

I would then like to concatenate the following objects and store the result in a pandas data frame:然后我想连接以下对象并将结果存储在熊猫数据框中：

gr1 = df[df["Subset"] == 1].groupby(["ID","Subset"]).mean()
gr2 = df[df["Subset"] == 2].groupby(["ID","Subset"]).mean()
# Why do gr1 and gr2 have column names in different rows?

I realize that df.groupby(["ID","Subset"]).mean() would give me the concatenated object I'm looking for.我意识到df.groupby(["ID","Subset"]).mean()会给我我正在寻找的连接对象。 Just bear with me, this is a reduced example of what I'm actually dealing with.请耐心等待，这是我实际处理的简化示例。

I think the solution could be to transform gr1 and gr2 to pandas data frames and then concatenate them like I normally would. 我认为解决方案可能是将gr1和gr2转换为熊猫数据帧，然后像往常一样将它们连接起来。

In essence, my questions are the following:本质上，我的问题如下：

How do I convert a groupby result to a data frame object?如何将groupby结果转换为数据框对象？
In case this can be done without transforming the series to data frames, how do you bind two groupby results together and then transform that to a pandas data frame?如果这可以在不将系列转换为数据框的情况下完成，您如何将两个groupby结果绑定在一起，然后将其转换为熊猫数据框？

PS: I come from an R background, so to me it's odd to group a data frame by something and have the output return as a different type of object (series or multi index data frame). PS：我来自 R 背景，所以对我来说，将数据帧按某些东西分组并将输出返回为不同类型的对象（系列或多索引数据帧）是很奇怪的。 This is part of my question too: why does .groupby return a series?这也是我的问题的一部分：为什么.groupby返回一个系列？ What kind of series is this?这是一个什么样的系列？ How come a series can have multiple columns and an index?为什么一个系列可以有多个列和一个索引？

Answer 1

The return type in your example is a pandas MultiIndex object.您示例中的返回类型是 pandas MultiIndex对象。 To return a dataframe with a single transformation function for a single value, then you can use the following.要为单个值返回具有单个转换函数的数据帧，则可以使用以下内容。 Note the inclusion of as_index=False .请注意包含as_index=False 。

>>> gr1 = df[df["Subset"] == 1].groupby(["ID","Subset"], as_index=False).mean()
>>> gr1

    ID  Subset  Value
0    1       1      6

This however won't work if you wish to aggregate multiple functions like here .但是，如果您希望像此处这样聚合多个函数，这将不起作用。 If you wish to avoid using df.groupby(["ID","Subset"]).mean() , then you can use the following for your example.如果您希望避免使用df.groupby(["ID","Subset"]).mean() ，那么您可以使用以下示例。

>>> gr1 = df[df["Subset"] == 1].groupby(["ID","Subset"], as_index=False).mean()
>>> gr2 = df[df["Subset"] == 2].groupby(["ID","Subset"], as_index=False).mean()

>>> pd.concat([gr1, gr2]).reset_index(drop=True)

   ID   Subset  Value
0   1        1      6
1   2        2      4

If you're only concerned with dealing with a specific subset of rows, the following could be applicable, since it removes the necessity to concatenate results.如果您只关心处理特定的行子集，以下可能适用，因为它消除了连接结果的必要性。

>>> values = [1,2]
>>> df[df['Subset'].isin(values)].groupby(["ID","Subset"], as_index=False).mean()

    ID  Subset  Value
0   1        1      6
1   2        2      4

连接多个 Pandas groupby 输出

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-08-24 10:33:30

连接多个 Pandas groupby 输出

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-08-24 10:33:30

解决方案1
1 已采纳 2019-08-24 10:33:30