[英]Pandas calculate column based on other row
I need to calculate a column based on other row.我需要根据其他行计算一列。 Basically I want my new_column to be the sum of "base_column" for all row with same id.
基本上我希望我的 new_column 是具有相同 ID 的所有行的“base_column”之和。
I currently do the following (but is not really efficient) what is the most efficient way to achieve that?我目前执行以下操作(但效率不高)实现这一目标的最有效方法是什么?
def calculate(x):
filtered_df = df[["id"] == dataset.at[x.name, "id"]] # in fact my filter is more complex basically same id and date in the last 4 weeks
df.at[x.name, "new_column"] = filtered_df["base_column"].sum()
df.apply(calculate)
Another way to do this is to use groupby
and merge
另一种方法是使用
groupby
和merge
import pandas as pd
df = pd.DataFrame({'id':[1,1,2],'base_column':[2,4,5]})
# compute sum by id
sum_base =df.groupby("id").agg({"base_column": 'sum'}).reset_index().rename(columns={'base_column':'new_column'})
# join the result to df
df = pd.merge(df,sum_base,how='left',on='id')
# id base_column new_column
#0 1 2 6
#1 1 4 6
#2 2 5 5
You can do a below你可以在下面做
df['new_column']= df.groupby('id')['base_column'].transform('sum')
input输入
id base_column
0 1 2
1 1 4
2 2 5
3 3 6
4 5 7
5 7 4
6 7 5
7 7 3
output output
id base_column new_column
0 1 2 6
1 1 4 6
2 2 5 5
3 3 6 6
4 5 7 7
5 7 4 12
6 7 5 12
7 7 3 12
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.