简体   繁体   中英

fill the column with the sum of the values from the other column in the specified interval in pandas

I have two such dataframes:

         df1                      df2   
col1     col2    col3       col1     col2
item1     14     NaN        item1      3
item1     28     NaN        item2      4
item1      6     NaN        ... 
item1     16     NaN            
item1      7     NaN            
item1     25     NaN            
item1     11     NaN            
item1     17     NaN            
item1     10     NaN            
item1     22     NaN            
item2     21     NaN            
item2     25     NaN            
item2     24     NaN            
item2     25     NaN            
item2     16     NaN            
item2     15     NaN            
item2     26     NaN            
item2     14     NaN            
item2     16     NaN            
item2     30     NaN            
...

I need to fill column col3 in the dataframe df1 with the sum of the values from column col2 of df1 in the interval specified in the dataframe df2 . For each unique value in column col1 of the dataframe df1 , it is unique and is indicated in column col2 of df2 . But at the same time, if there are not enough values in column col2 in df1 , then summing up only those that are. I need to get such amounts for each unique value from column col1 of df1 .

The result should look like this:

col1    col2    col3
item1    14      48
item1    28      50
item1     6      29
item1    16      48
item1     7      43
item1    25      53
item1    11      38
item1    17      49
item1    10      32
item1    22      22
item2    21      95
item2    25      90
item2    24      80
item2    25      82
item2    16      71
item2    15      71
item2    26      86
item2    14      60
item2    16      46
item2    30      30
...

Below is an example of how to calculate for the case of item1 :

col1   col2  calculations for col3              
item1   14   (14 + 28 +  6)  =48
item1   28   (28 +  6 + 16)  =50
item1   6    ( 6 + 16 +  7)  =29
item1   16   (16 +  7 + 25)  =48
item1   7    ( 7 + 25 + 11)  =43
item1   25   (25 + 11 + 17)  =53
item1   11   (11 + 17 + 10)  =38
item1   17   (17 + 10 + 22)  =49
item1   10   (10 + 22     )  =32
item1   22   (22          )  =22

The problem is that there are a lot of unique values in column col1 of df1 and the interval specified in column col2 in df2 can be different each time.

I will be grateful for any help!

Rolling with apply

df1['window']=df1.col1.map(df2.set_index('col1').col2)


df1['col3']=df1.groupby('col1').apply(lambda x : x.col2.sort_index(ascending=False)\
 .rolling(window=x.window.values[0],min_periods=1).sum()).reset_index(level='col1',drop=True)


df1
Out[219]: 
     col1  col2  col3  window
0   item1    14  48.0       3
1   item1    28  50.0       3
2   item1     6  29.0       3
3   item1    16  48.0       3
4   item1     7  43.0       3
5   item1    25  53.0       3
6   item1    11  38.0       3
7   item1    17  49.0       3
8   item1    10  32.0       3
9   item1    22  22.0       3
10  item2    21  95.0       4
11  item2    25  90.0       4
12  item2    24  80.0       4
13  item2    25  82.0       4
14  item2    16  71.0       4
15  item2    15  71.0       4
16  item2    26  86.0       4
17  item2    14  60.0       4
18  item2    16  46.0       4
19  item2    30  30.0       4

Another similar approach with rolling sum, [::-1] ie

df1['new'] = df1['col1'].map(df2.set_index('col1')['col2'])
df1['col3'] = df1.groupby(['col1'])['col2','new'].apply( lambda x : x[['col2']][::-1].rolling(x.new.values[0],min_periods=1).sum()[::-1]).values

Output :

col1  col2  col3  new
0   item1    14  48.0    3
1   item1    28  50.0    3
2   item1     6  29.0    3
3   item1    16  48.0    3
4   item1     7  43.0    3
5   item1    25  53.0    3
6   item1    11  38.0    3
7   item1    17  49.0    3
8   item1    10  32.0    3
9   item1    22  22.0    3
10  item2    21  95.0    4
11  item2    25  90.0    4
12  item2    24  80.0    4
13  item2    25  82.0    4
14  item2    16  71.0    4
15  item2    15  71.0    4
16  item2    26  86.0    4
17  item2    14  60.0    4
18  item2    16  46.0    4
19  item2    30  30.0    4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM