簡體   English   中英

使用 pandas 將之前的行加起來最多為 3 並與另一列的值相乘

[英]Sum up previous rows upto 3 and multiply with value from another column using pandas

我有 2 個數據幀,我想根據每個前 3 行的唯一 ID 的 groupby 來獲取每行的總和值,並且每行值應該乘以其他 dataframe 值

   for example : dataframe A                     dataframe B
                    unique_id  value  out_value      num_values  
                  1    1        45                     0.15
                  2    1        33                     0.30  
                  3    1        18                     0.18
                 #4    1        26    20.7
                  5    2        66
                  6    2        44
                  7    2        22
                 #8    2        19.   28.3

           expected output_value column 
              4th row = 18 * 0.15 + 33*0.30 + 45*0.18 = 2.7+9.9+8.1 = 20.7          
              8th row = 22 * 0.15 + 44*0.30 + 66*0.18 = 3.3+ 13.2 + 11.88= 28.3

  based on Unique_id each value should calculate based previous 3values.
  for every row there will be previous 3 rows available 
import pandas as pd
import numpy as np

df_a = pd.DataFrame({
    'uni_id':[1, 1, 1, 1, 2, 2, 2, 2, 152, 152, 152, 152, 152],
    'value':[45,33,18,26,66,44,22,19,36,27,45,81,90]
}, index=range(1,14))
df_b = pd.DataFrame({
    'num_values':[0.15,0.30,0.18]
})
df_a
###
    uni_id  value
1        1     45
2        1     33
3        1     18
4        1     26
5        2     66
6        2     44
7        2     22
8        2     19
9      152     36
10     152     27
11     152     45
12     152     81
13     152     90

df_b
###
   num_values
0        0.15
1        0.30
2        0.18

# main calculation
arr = [df_a['value'].shift(x+1).values[::-1][:3] for x in range(len(df_a['value']))[::-1]]
arr_b = pd.Series(np.inner(arr, df_b['num_values']))

# filter and clean
mask = df_a.groupby('uni_id').cumcount()+1 > 3
output = arr_b * mask
output[output == 0] = np.nan

# concat result to df_a
df_a['out_value'] = output


df_a
###
    uni_id  value  out_value
1        1     45        NaN
2        1     33        NaN
3        1     18        NaN
4        1     26      20.70
5        2     66        NaN
6        2     44        NaN
7        2     22        NaN
8        2     19      28.38
9      152     36        NaN
10     152     27        NaN
11     152     45        NaN
12     152     81      21.33
13     152     90      30.51


如果要通過過濾保留非空值:

df_a.query('out_value.notnull()')
###
    uni_id  value  out_value
4        1     26      20.70
8        2     19      28.38
12     152     81      21.33
13     152     90      30.51

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM