I have two such dataframes:
df1 df2
col1 col2 col3 col1 col2
item1 14 NaN item1 3
item1 28 NaN item2 4
item1 6 NaN ...
item1 16 NaN
item1 7 NaN
item1 25 NaN
item1 11 NaN
item1 17 NaN
item1 10 NaN
item1 22 NaN
item2 21 NaN
item2 25 NaN
item2 24 NaN
item2 25 NaN
item2 16 NaN
item2 15 NaN
item2 26 NaN
item2 14 NaN
item2 16 NaN
item2 30 NaN
...
I need to fill column col3
in the dataframe df1
with the sum of the values from column col2
of df1
in the interval specified in the dataframe df2
. For each unique value in column col1
of the dataframe df1
, it is unique and is indicated in column col2
of df2
. But at the same time, if there are not enough values in column col2
in df1
, then summing up only those that are. I need to get such amounts for each unique value from column col1
of df1
.
The result should look like this:
col1 col2 col3
item1 14 48
item1 28 50
item1 6 29
item1 16 48
item1 7 43
item1 25 53
item1 11 38
item1 17 49
item1 10 32
item1 22 22
item2 21 95
item2 25 90
item2 24 80
item2 25 82
item2 16 71
item2 15 71
item2 26 86
item2 14 60
item2 16 46
item2 30 30
...
Below is an example of how to calculate for the case of item1
:
col1 col2 calculations for col3
item1 14 (14 + 28 + 6) =48
item1 28 (28 + 6 + 16) =50
item1 6 ( 6 + 16 + 7) =29
item1 16 (16 + 7 + 25) =48
item1 7 ( 7 + 25 + 11) =43
item1 25 (25 + 11 + 17) =53
item1 11 (11 + 17 + 10) =38
item1 17 (17 + 10 + 22) =49
item1 10 (10 + 22 ) =32
item1 22 (22 ) =22
The problem is that there are a lot of unique values in column col1
of df1
and the interval specified in column col2
in df2
can be different each time.
I will be grateful for any help!
Rolling
with apply
df1['window']=df1.col1.map(df2.set_index('col1').col2)
df1['col3']=df1.groupby('col1').apply(lambda x : x.col2.sort_index(ascending=False)\
.rolling(window=x.window.values[0],min_periods=1).sum()).reset_index(level='col1',drop=True)
df1
Out[219]:
col1 col2 col3 window
0 item1 14 48.0 3
1 item1 28 50.0 3
2 item1 6 29.0 3
3 item1 16 48.0 3
4 item1 7 43.0 3
5 item1 25 53.0 3
6 item1 11 38.0 3
7 item1 17 49.0 3
8 item1 10 32.0 3
9 item1 22 22.0 3
10 item2 21 95.0 4
11 item2 25 90.0 4
12 item2 24 80.0 4
13 item2 25 82.0 4
14 item2 16 71.0 4
15 item2 15 71.0 4
16 item2 26 86.0 4
17 item2 14 60.0 4
18 item2 16 46.0 4
19 item2 30 30.0 4
Another similar approach with rolling sum, [::-1]
ie
df1['new'] = df1['col1'].map(df2.set_index('col1')['col2'])
df1['col3'] = df1.groupby(['col1'])['col2','new'].apply( lambda x : x[['col2']][::-1].rolling(x.new.values[0],min_periods=1).sum()[::-1]).values
Output :
col1 col2 col3 new 0 item1 14 48.0 3 1 item1 28 50.0 3 2 item1 6 29.0 3 3 item1 16 48.0 3 4 item1 7 43.0 3 5 item1 25 53.0 3 6 item1 11 38.0 3 7 item1 17 49.0 3 8 item1 10 32.0 3 9 item1 22 22.0 3 10 item2 21 95.0 4 11 item2 25 90.0 4 12 item2 24 80.0 4 13 item2 25 82.0 4 14 item2 16 71.0 4 15 item2 15 71.0 4 16 item2 26 86.0 4 17 item2 14 60.0 4 18 item2 16 46.0 4 19 item2 30 30.0 4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.