通过groupby连接熊猫数据框

Question

I have a pandas DataFrame with columns 'x', 'y', 'z' However a lot of the x and y values are redundant. 我有一个带有列“ x”，“ y”，“ z”的pandas DataFrame。但是，许多x和y值是多余的。 I want to take all rows that have the same x and y values and sum the third column, returning a smaller DataFrame. 我想取所有具有相同x和y值的行，并对第三列求和，返回一个较小的DataFrame。 So given 因此给定

         x     y         z
0       1      2         1
1       1      2         5
2       1      2         0
3       1      3         0
4       2      6         1

it would return: 它会返回：

        x      y         z
0       1      2         6
1       1      3         0
2       2      6         1

I've tried 我试过了

df = df.groupby(['x', 'y'])['z'].sum

but I'm not sure how to work with grouped objects. 但我不确定如何使用分组对象。

Answer 1

Very close as-is; 十分接近现状； you just need to call .sum() and then reset the index: 您只需要调用.sum()然后重置索引：

>>> df.groupby(['x', 'y'])['z'].sum().reset_index()
   x  y  z
0  1  2  6
1  1  3  0
2  2  6  1

There is also a parameter to groupby() that handles that: groupby()还有一个参数可以处理：

>>> df.groupby(['x', 'y'], as_index=False)['z'].sum()
   x  y  z
0  1  2  6
1  1  3  0
2  2  6  1

In your question, you have df.groupby(['x', 'y'])['z'].sum without parentheses. 在您的问题中，您有没有df.groupby(['x', 'y'])['z'].sum 。 This simply references the method .sum as a Python object, without calling it. 这只是将方法 .sum引用为Python对象，而不调用它。

>>> type(df.groupby(['x', 'y'])['z'].sum)
method

>>> callable(df.groupby(['x', 'y'])['z'].sum)
True

Answer 2

Another option without using groupby syntax is to use the indexes and summing on index levels like this: 不使用groupby语法的另一种选择是使用索引并在索引级别进行求和，如下所示：

df.set_index(['x','y']).sum(level=[0,1]).reset_index()

Output: 输出：

通过groupby连接熊猫数据框

问题描述

2 个解决方案

解决方案1
4 已采纳 2018-07-06 19:54:49

解决方案2
0 2018-07-06 20:08:30

通过groupby连接熊猫数据框

问题描述

2 个解决方案

解决方案1 4 已采纳 2018-07-06 19:54:49

解决方案2 0 2018-07-06 20:08:30

解决方案1
4 已采纳 2018-07-06 19:54:49

解决方案2
0 2018-07-06 20:08:30