简体   繁体   English

pandas groupby 应用返回数据帧

[英]pandas groupby apply returning a dataframe

Consider the following code:考虑以下代码:

>>> df = pd.DataFrame(np.random.randint(0, 4, 16).reshape(4, 4), columns=list('ABCD'))
... df
...
   A  B  C  D
0  2  1  0  2
1  3  0  2  2
2  0  2  0  2
3  2  1  2  0
>>> def grouper(frame):
...     return frame
...     
... df.groupby('A').apply(grouper)
...
   A  B  C  D
0  2  1  0  2
1  3  0  2  2
2  0  2  0  2
3  2  1  2  0

As you can see, the results are identical.如您所见,结果是相同的。 Here is the documentation of apply :这是apply的文档:

The function passed to apply must take a dataframe as its first argument and return a DataFrame, Series or scalar.传递给 apply 的函数必须将数据帧作为其第一个参数,并返回数据帧、系列或标量。 apply will then take care of combining the results back together into a single dataframe or series.然后 apply 将负责将结果组合回单个数据框或系列。 apply is therefore a highly flexible grouping method.因此 apply 是一种高度灵活的分组方法。

Groupby will divide group into small dataframes like this: Groupby 会将组划分为小数据框,如下所示:

   A  B  C  D
2  0  2  0  2

   A  B  C  D
0  2  1  0  2
3  2  1  2  0

   A  B  C  D
1  3  0  2  2

apply documentation says that it combines the dataframes back into a single dataframe. apply文档说它将数据帧组合回单个数据帧。 I am curious how it combined them in a way that the final result is the same as the original dataframe.我很好奇它如何以最终结果与原始数据帧相同的方式组合它们。 If it had used concat , the final dataframe would have been equal to:如果它使用了concat ,则最终的数据帧将等于:

   A  B  C  D
2  0  2  0  2
0  2  1  0  2
3  2  1  2  0
1  3  0  2  2

I am curious how this concatenation has been done.我很好奇这种连接是如何完成的。

If you look at the source code you will see that there is a parameter not_indexed_same that checks if the index remains the same after groupby.如果您查看源代码,您会看到有一个参数not_indexed_same用于检查在 groupby 之后索引是否保持不变。 If it is the same then groupby does reindexing of the dataframe before returning results.如果相同,则 groupby 在返回结果之前对数据帧进行重新索引。 I do not know why this was implemented.我不知道为什么要实施这个。

The change was made on Aug 21, 2011 and Wes made no comments on the change: https://github.com/pandas-dev/pandas/commit/00c8da0208553c37ca6df0197da431515df813b7#diff-720d374f1a709d0075a1f0a02445cd65更改是在 2011 年 8 月 21 日进行的,Wes 没有对更改发表评论: https : //github.com/pandas-dev/pandas/commit/00c8da0208553c37ca6df0197da431515df813b7#diff-720d374f1a75a254000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM