使用熊貓數據框中的兩列計算加權和

Question

我正在嘗試使用 python 數據框中的兩列計算加權和。

數據幀結構：

unique_id   weight            value
1           0.061042375       20.16094523
1           0.3064548         19.50932003
1           0.008310739       18.76469039
1           0.624192086       21.25
2           0.061042375       20.23776924
2           0.3064548         19.63366165
2           0.008310739       18.76299395
2           0.624192086       21.25

.......

我想要的輸出是：

每個 unique_id 的加權總和 = sum((weight) * (value))

示例：unique_id 1 的加權和 = ( (0.061042375 * 20.16094523) + (0.3064548 * 19.50932003) + (0.008310739 * 18.76469039) + (0.16094523) + (0.2162)

我查看了這個答案（使用熊貓/數據框計算加權平均值），但無法找出將其應用於我的特定場景的正確方法。

這就是我根據上述答案所做的：

#Assume temp_weighted_sum_dataframe is the dataframe stated above

grouped_data = temp_weighted_sum_dataframe.groupby('unique_id') #I think this groups data based on unique_id values
weighted_sum_output = (grouped_data.weight * grouped_data.value).transform("sum") #This should allow me to multiple weight and value for every record within each group and sum it up to one value for that group.

# On above line I am getting the error > TypeError: unsupported operand type(s) for *: 'SeriesGroupBy' and 'SeriesGroupBy'

任何幫助表示贊賞，謝謝

Answer 1

鏈接問題中接受的答案確實可以解決您的問題。 但是，我會用一個 groupby 以不同的方式解決它：

u = (df.assign(s=df['weight']*df['value'])
       .groupby('unique_id')
       [['s', 'weight']]
       .sum()
     )

u['s']/u['weight']

輸出：

unique_id
1    20.629427
2    20.672208
dtype: float64

Answer 2

你可以這樣做：

df['partial_sum'] = df['weight']*df['value']
out = df.groupby('unique_id')['partial_sum'].agg('sum')

輸出：

unique_id
1    20.629427
2    20.672208

或者..

df['weight'].mul(df['value']).groupby(df['unique_id']).sum()

相同的輸出

Answer 3

您可以利用agg使用agg與@ （這是dot ）

df.groupby('unique_id')[['weight']].agg(lambda x: x.weight @ x.value)

Out[24]:
              weight
unique_id
1          20.629427
2          20.672208

使用熊貓數據框中的兩列計算加權和

問題描述

3 個解決方案

解決方案1
3 2019-11-27 03:17:29

解決方案2
2 已采納 2019-11-27 03:23:39

解決方案3
2 2019-11-27 03:42:41

使用熊貓數據框中的兩列計算加權和

問題描述

3 個解決方案

解決方案1 3 2019-11-27 03:17:29

解決方案2 2 已采納 2019-11-27 03:23:39

解決方案3 2 2019-11-27 03:42:41

解決方案1
3 2019-11-27 03:17:29

解決方案2
2 已采納 2019-11-27 03:23:39

解決方案3
2 2019-11-27 03:42:41