简体   繁体   English

基于 Pandas DataFrame 中另一列的 Sum 列

[英]Sum column based on another column in Pandas DataFrame

I have a pandas DataFrame like this:我有一个像这样的熊猫数据帧:

>>> df = pd.DataFrame({'MONTREGL':[10,10,2222,35,200,56,5555],'SINID':['aaa','aaa','aaa','bbb','bbb','ccc','ccc'],'EXTRA':[400,400,400,500,500,333,333]})
>>> df
   MONTREGL SINID EXTRA
0        10   aaa   400
1        10   aaa   400
2      2222   aaa   400
3        35   bbb   500
4       200   bbb   500
5        56   ccc   333
6      5555   ccc   333

I want to sum the column MONTREGL for each groupby SINID ...我想按MONTREGL为每个组对列MONTREGL SINID ...

So I get 2242 for aaa and so on... ALSO I want to keep the value of column EXTRA .所以我得到 2242 为 aaa 等等......我还想保留列EXTRA的值。

This is the expected result:这是预期的结果:

   MONTREGL SINID EXTRA
0      2242   aaa   400
1       235   bbb   500
2      5611   ccc   333

Thanks for your help in advance!提前感谢您的帮助!

I ended up using this script:我最终使用了这个脚本:

dff = df.groupby(["SINID","EXTRA"]).MONTREGL.sum().reset_index()

And it works in this test and production.它适用于本次测试和生产。

The code below works for your example:下面的代码适用于您的示例:

df1 = df.groupby(["SINID"]).sum()
df1['EXTRA'] = df.groupby(["SINID"]).mean()['EXTRA']

Result :结果 :

       MONTREGL  EXTRA
SINID                 
aaa        2242  400.0
bbb         235  500.0
ccc        5611  333.0

my suggestion would be to filter you dataframe with conditions related to other columns then apply sum function,我的建议是使用与其他列相关的条件过滤数据框,然后应用 sum 函数,

it goes something like this.它是这样的。

import pandas as pd

df=pd.Dataframe({a:[1,2,3],b:[2001,2015,2019],c:[1,0,1]})

aux=df[df.c>0]

sa=aux.a.sum()

sb=aux.b.sum()

My syntax may not be correct ( i didnt run the code ) but it will probably work and lead you to your answer我的语法可能不正确(我没有运行代码)但它可能会起作用并引导您找到答案

Good luck.祝你好运。

I know this post is old, but this might be helpful for others:我知道这篇文章很旧,但这可能对其他人有帮助:

Using loc: df.loc[df['SINID'] == aaa].MONTREGL.sum()使用 loc: df.loc[df['SINID'] == aaa].MONTREGL.sum()

Using groupby: df.groupby('SINID')['MONTREGL'].sum()使用 groupby: df.groupby('SINID')['MONTREGL'].sum()

A similar question is answered in the following link (check Alex Riley's response):以下链接回答了类似的问题(查看 Alex Riley 的回复):

How do I sum values in a column that match a given condition using pandas? 如何使用熊猫对与给定条件匹配的列中的值求和?

Good luck,祝你好运,

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM