[英]Sum up previous rows upto 3 and multiply with value from another column using pandas
我有 2 個數據幀,我想根據每個前 3 行的唯一 ID 的 groupby 來獲取每行的總和值,並且每行值應該乘以其他 dataframe 值
for example : dataframe A dataframe B
unique_id value out_value num_values
1 1 45 0.15
2 1 33 0.30
3 1 18 0.18
#4 1 26 20.7
5 2 66
6 2 44
7 2 22
#8 2 19. 28.3
expected output_value column
4th row = 18 * 0.15 + 33*0.30 + 45*0.18 = 2.7+9.9+8.1 = 20.7
8th row = 22 * 0.15 + 44*0.30 + 66*0.18 = 3.3+ 13.2 + 11.88= 28.3
based on Unique_id each value should calculate based previous 3values.
for every row there will be previous 3 rows available
import pandas as pd
import numpy as np
df_a = pd.DataFrame({
'uni_id':[1, 1, 1, 1, 2, 2, 2, 2, 152, 152, 152, 152, 152],
'value':[45,33,18,26,66,44,22,19,36,27,45,81,90]
}, index=range(1,14))
df_b = pd.DataFrame({
'num_values':[0.15,0.30,0.18]
})
df_a
###
uni_id value
1 1 45
2 1 33
3 1 18
4 1 26
5 2 66
6 2 44
7 2 22
8 2 19
9 152 36
10 152 27
11 152 45
12 152 81
13 152 90
df_b
###
num_values
0 0.15
1 0.30
2 0.18
# main calculation
arr = [df_a['value'].shift(x+1).values[::-1][:3] for x in range(len(df_a['value']))[::-1]]
arr_b = pd.Series(np.inner(arr, df_b['num_values']))
# filter and clean
mask = df_a.groupby('uni_id').cumcount()+1 > 3
output = arr_b * mask
output[output == 0] = np.nan
# concat result to df_a
df_a['out_value'] = output
df_a
###
uni_id value out_value
1 1 45 NaN
2 1 33 NaN
3 1 18 NaN
4 1 26 20.70
5 2 66 NaN
6 2 44 NaN
7 2 22 NaN
8 2 19 28.38
9 152 36 NaN
10 152 27 NaN
11 152 45 NaN
12 152 81 21.33
13 152 90 30.51
如果要通過過濾保留非空值:
df_a.query('out_value.notnull()')
###
uni_id value out_value
4 1 26 20.70
8 2 19 28.38
12 152 81 21.33
13 152 90 30.51
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.