[英]Group pandas DataFrame on column and sum it while retaining the number of sumed observations
I have a pandas Dataframe that looks like this:我有一个 pandas Dataframe 看起来像这样:
import pandas as pd
df = pd.DataFrame({'id':[1, 1, 2, 2], 'comp': [-0.10,0.20,-0.10, 0.4], 'word': ['boy','girl','man', 'woman']})
I would like to group the dataframe on id
, and calculate the sum of corresponding comp
as well as get a new column called n_obs
that tracks how many rows(ids) were summed up.我想在id
上对 dataframe 进行分组,并计算相应comp
的总和,并获得一个名为n_obs
的新列,该列跟踪汇总了多少行(id)。
I tried using df.groupby('id').sum()
but this is not quite producing the results that I want.我尝试使用df.groupby('id').sum()
但这并没有产生我想要的结果。
I'd like an output on the below form:我想要以下表格中的 output:
id comp n_obs
1 0.1 2
2 0.3 2
Any suggestions on how I can do this?关于如何执行此操作的任何建议?
You can use .groupby()
with .agg()
:您可以将.groupby()
与.agg()
) 一起使用:
df.groupby("id").agg(comp=("comp", "sum"), n_obs=("id", "count"))
This outputs:这输出:
comp n_obs
id
1 0.1 2
2 0.3 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.