[英]Concatenate pandas Dataframe via groupby
I have a pandas DataFrame with columns 'x', 'y', 'z' However a lot of the x and y values are redundant. 我有一个带有列“ x”,“ y”,“ z”的pandas DataFrame。但是,许多x和y值是多余的。 I want to take all rows that have the same x and y values and sum the third column, returning a smaller DataFrame.
我想取所有具有相同x和y值的行,并对第三列求和,返回一个较小的DataFrame。 So given
因此给定
x y z
0 1 2 1
1 1 2 5
2 1 2 0
3 1 3 0
4 2 6 1
it would return: 它会返回:
x y z
0 1 2 6
1 1 3 0
2 2 6 1
I've tried 我试过了
df = df.groupby(['x', 'y'])['z'].sum
but I'm not sure how to work with grouped objects. 但我不确定如何使用分组对象。
Very close as-is; 十分接近现状; you just need to call
.sum()
and then reset the index: 您只需要调用
.sum()
然后重置索引:
>>> df.groupby(['x', 'y'])['z'].sum().reset_index()
x y z
0 1 2 6
1 1 3 0
2 2 6 1
There is also a parameter to groupby()
that handles that: groupby()
还有一个参数可以处理:
>>> df.groupby(['x', 'y'], as_index=False)['z'].sum()
x y z
0 1 2 6
1 1 3 0
2 2 6 1
In your question, you have df.groupby(['x', 'y'])['z'].sum
without parentheses. 在您的问题中,您有没有
df.groupby(['x', 'y'])['z'].sum
。 This simply references the method .sum
as a Python object, without calling it. 这只是将方法
.sum
引用为Python对象,而不调用它。
>>> type(df.groupby(['x', 'y'])['z'].sum)
method
>>> callable(df.groupby(['x', 'y'])['z'].sum)
True
Another option without using groupby
syntax is to use the indexes and summing on index levels like this: 不使用
groupby
语法的另一种选择是使用索引并在索引级别进行求和,如下所示:
df.set_index(['x','y']).sum(level=[0,1]).reset_index()
Output: 输出:
x y z
0 1 2 6
1 1 3 0
2 2 6 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.