Pandas 根据其他行计算列

Question

I need to calculate a column based on other row.我需要根据其他行计算一列。 Basically I want my new_column to be the sum of "base_column" for all row with same id.基本上我希望我的 new_column 是具有相同 ID 的所有行的“base_column”之和。

I currently do the following (but is not really efficient) what is the most efficient way to achieve that?我目前执行以下操作（但效率不高）实现这一目标的最有效方法是什么？

def calculate(x):
   filtered_df = df[["id"] == dataset.at[x.name, "id"]] # in fact my filter is more complex basically same id and date in the last 4 weeks
   df.at[x.name, "new_column"] = filtered_df["base_column"].sum()

df.apply(calculate)

Answer 1

Another way to do this is to use groupby and merge另一种方法是使用groupby和merge

import pandas as pd

df = pd.DataFrame({'id':[1,1,2],'base_column':[2,4,5]})
# compute sum by id
sum_base =df.groupby("id").agg({"base_column": 'sum'}).reset_index().rename(columns={'base_column':'new_column'})
# join the result to df
df = pd.merge(df,sum_base,how='left',on='id')

 # id   base_column new_column
 #0 1   2   6
 #1 1   4   6
 #2 2   5   5

Answer 2

You can do a below你可以在下面做

df['new_column']= df.groupby('id')['base_column'].transform('sum')

input输入

    id  base_column
0   1   2
1   1   4
2   2   5
3   3   6
4   5   7
5   7   4
6   7   5
7   7   3

output output

    id  base_column     new_column
0   1             2     6
1   1             4     6
2   2             5     5
3   3             6     6
4   5             7     7
5   7             4     12
6   7             5     12
7   7             3     12

Pandas 根据其他行计算列

问题描述

2 个解决方案

解决方案1
0 2019-10-24 00:51:25

解决方案2
0 2019-10-24 02:40:35

Pandas 根据其他行计算列

问题描述

2 个解决方案

解决方案1 0 2019-10-24 00:51:25

解决方案2 0 2019-10-24 02:40:35

解决方案1
0 2019-10-24 00:51:25

解决方案2
0 2019-10-24 02:40:35