在指定的时间间隔内以熊猫中另一列的值之和填充该列

Question

I have two such dataframes: 我有两个这样的数据框：

         df1                      df2   
col1     col2    col3       col1     col2
item1     14     NaN        item1      3
item1     28     NaN        item2      4
item1      6     NaN        ... 
item1     16     NaN            
item1      7     NaN            
item1     25     NaN            
item1     11     NaN            
item1     17     NaN            
item1     10     NaN            
item1     22     NaN            
item2     21     NaN            
item2     25     NaN            
item2     24     NaN            
item2     25     NaN            
item2     16     NaN            
item2     15     NaN            
item2     26     NaN            
item2     14     NaN            
item2     16     NaN            
item2     30     NaN            
...

I need to fill column col3 in the dataframe df1 with the sum of the values from column col2 of df1 in the interval specified in the dataframe df2 . 我需要在数据帧df2指定的间隔内，用df1列col2的值之和填充数据帧df1 col3列。 For each unique value in column col1 of the dataframe df1 , it is unique and is indicated in column col2 of df2 . 对于数据帧df1 col1列中的每个唯一值，该值都是唯一的，并在df2 col2列中指示。 But at the same time, if there are not enough values in column col2 in df1 , then summing up only those that are. 但是同时，如果df1 col2列中没有足够的值，则仅将那些值相加。 I need to get such amounts for each unique value from column col1 of df1 . 我需要从df1 col1列获取每个唯一值的金额。

The result should look like this: 结果应如下所示：

col1    col2    col3
item1    14      48
item1    28      50
item1     6      29
item1    16      48
item1     7      43
item1    25      53
item1    11      38
item1    17      49
item1    10      32
item1    22      22
item2    21      95
item2    25      90
item2    24      80
item2    25      82
item2    16      71
item2    15      71
item2    26      86
item2    14      60
item2    16      46
item2    30      30
...

Below is an example of how to calculate for the case of item1 : 以下是有关item1情况的计算示例：

col1   col2  calculations for col3              
item1   14   (14 + 28 +  6)  =48
item1   28   (28 +  6 + 16)  =50
item1   6    ( 6 + 16 +  7)  =29
item1   16   (16 +  7 + 25)  =48
item1   7    ( 7 + 25 + 11)  =43
item1   25   (25 + 11 + 17)  =53
item1   11   (11 + 17 + 10)  =38
item1   17   (17 + 10 + 22)  =49
item1   10   (10 + 22     )  =32
item1   22   (22          )  =22

The problem is that there are a lot of unique values in column col1 of df1 and the interval specified in column col2 in df2 can be different each time. 问题在于， df1 col1列中有很多唯一值，并且df2 col2列中指定的间隔每次都可能不同。

I will be grateful for any help! 我将不胜感激！

Answer 1

Rolling with apply Rolling apply

df1['window']=df1.col1.map(df2.set_index('col1').col2)


df1['col3']=df1.groupby('col1').apply(lambda x : x.col2.sort_index(ascending=False)\
 .rolling(window=x.window.values[0],min_periods=1).sum()).reset_index(level='col1',drop=True)


df1
Out[219]: 
     col1  col2  col3  window
0   item1    14  48.0       3
1   item1    28  50.0       3
2   item1     6  29.0       3
3   item1    16  48.0       3
4   item1     7  43.0       3
5   item1    25  53.0       3
6   item1    11  38.0       3
7   item1    17  49.0       3
8   item1    10  32.0       3
9   item1    22  22.0       3
10  item2    21  95.0       4
11  item2    25  90.0       4
12  item2    24  80.0       4
13  item2    25  82.0       4
14  item2    16  71.0       4
15  item2    15  71.0       4
16  item2    26  86.0       4
17  item2    14  60.0       4
18  item2    16  46.0       4
19  item2    30  30.0       4

Answer 2

Another similar approach with rolling sum, [::-1] ie rolling sum, [::-1]另一种类似方法rolling sum, [::-1]即

df1['new'] = df1['col1'].map(df2.set_index('col1')['col2'])
df1['col3'] = df1.groupby(['col1'])['col2','new'].apply( lambda x : x[['col2']][::-1].rolling(x.new.values[0],min_periods=1).sum()[::-1]).values

Output : 输出：

col1  col2  col3  new
0   item1    14  48.0    3
1   item1    28  50.0    3
2   item1     6  29.0    3
3   item1    16  48.0    3
4   item1     7  43.0    3
5   item1    25  53.0    3
6   item1    11  38.0    3
7   item1    17  49.0    3
8   item1    10  32.0    3
9   item1    22  22.0    3
10  item2    21  95.0    4
11  item2    25  90.0    4
12  item2    24  80.0    4
13  item2    25  82.0    4
14  item2    16  71.0    4
15  item2    15  71.0    4
16  item2    26  86.0    4
17  item2    14  60.0    4
18  item2    16  46.0    4
19  item2    30  30.0    4

在指定的时间间隔内以熊猫中另一列的值之和填充该列

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-10-13 14:43:45

解决方案2
1 2017-10-13 14:53:53

在指定的时间间隔内以熊猫中另一列的值之和填充该列

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-10-13 14:43:45

解决方案2 1 2017-10-13 14:53:53

解决方案1
2 已采纳 2017-10-13 14:43:45

解决方案2
1 2017-10-13 14:53:53