简体   繁体   English

通过groupby连接熊猫数据框

[英]Concatenate pandas Dataframe via groupby

I have a pandas DataFrame with columns 'x', 'y', 'z' However a lot of the x and y values are redundant. 我有一个带有列“ x”,“ y”,“ z”的pandas DataFrame。但是,许多x和y值是多余的。 I want to take all rows that have the same x and y values and sum the third column, returning a smaller DataFrame. 我想取所有具有相同x和y值的行,并对第三列求和,返回一个较小的DataFrame。 So given 因此给定

         x     y         z
0       1      2         1
1       1      2         5
2       1      2         0
3       1      3         0
4       2      6         1

it would return: 它会返回:

        x      y         z
0       1      2         6
1       1      3         0
2       2      6         1

I've tried 我试过了

df = df.groupby(['x', 'y'])['z'].sum

but I'm not sure how to work with grouped objects. 但我不确定如何使用分组对象。

Very close as-is; 十分接近现状; you just need to call .sum() and then reset the index: 您只需要调用.sum()然后重置索引:

>>> df.groupby(['x', 'y'])['z'].sum().reset_index()
   x  y  z
0  1  2  6
1  1  3  0
2  2  6  1

There is also a parameter to groupby() that handles that: groupby()还有一个参数可以处理:

>>> df.groupby(['x', 'y'], as_index=False)['z'].sum()
   x  y  z
0  1  2  6
1  1  3  0
2  2  6  1

In your question, you have df.groupby(['x', 'y'])['z'].sum without parentheses. 在您的问题中,您有没有df.groupby(['x', 'y'])['z'].sum This simply references the method .sum as a Python object, without calling it. 这只是将方法 .sum引用为Python对象,而不调用它。

>>> type(df.groupby(['x', 'y'])['z'].sum)
method

>>> callable(df.groupby(['x', 'y'])['z'].sum)
True

Another option without using groupby syntax is to use the indexes and summing on index levels like this: 不使用groupby语法的另一种选择是使用索引并在索引级别进行求和,如下所示:

df.set_index(['x','y']).sum(level=[0,1]).reset_index()

Output: 输出:

   x  y  z
0  1  2  6
1  1  3  0
2  2  6  1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM