[英]fill the column with the sum of the values from the other column in the specified interval in pandas
I have two such dataframes: 我有两个这样的数据框:
df1 df2
col1 col2 col3 col1 col2
item1 14 NaN item1 3
item1 28 NaN item2 4
item1 6 NaN ...
item1 16 NaN
item1 7 NaN
item1 25 NaN
item1 11 NaN
item1 17 NaN
item1 10 NaN
item1 22 NaN
item2 21 NaN
item2 25 NaN
item2 24 NaN
item2 25 NaN
item2 16 NaN
item2 15 NaN
item2 26 NaN
item2 14 NaN
item2 16 NaN
item2 30 NaN
...
I need to fill column col3
in the dataframe df1
with the sum of the values from column col2
of df1
in the interval specified in the dataframe df2
. 我需要在数据帧
df2
指定的间隔内,用df1
列col2
的值之和填充数据帧df1
col3
列。 For each unique value in column col1
of the dataframe df1
, it is unique and is indicated in column col2
of df2
. 对于数据帧
df1
col1
列中的每个唯一值,该值都是唯一的,并在df2
col2
列中指示。 But at the same time, if there are not enough values in column col2
in df1
, then summing up only those that are. 但是同时,如果
df1
col2
列中没有足够的值,则仅将那些值相加。 I need to get such amounts for each unique value from column col1
of df1
. 我需要从
df1
col1
列获取每个唯一值的金额。
The result should look like this: 结果应如下所示:
col1 col2 col3
item1 14 48
item1 28 50
item1 6 29
item1 16 48
item1 7 43
item1 25 53
item1 11 38
item1 17 49
item1 10 32
item1 22 22
item2 21 95
item2 25 90
item2 24 80
item2 25 82
item2 16 71
item2 15 71
item2 26 86
item2 14 60
item2 16 46
item2 30 30
...
Below is an example of how to calculate for the case of item1
: 以下是有关
item1
情况的计算示例:
col1 col2 calculations for col3
item1 14 (14 + 28 + 6) =48
item1 28 (28 + 6 + 16) =50
item1 6 ( 6 + 16 + 7) =29
item1 16 (16 + 7 + 25) =48
item1 7 ( 7 + 25 + 11) =43
item1 25 (25 + 11 + 17) =53
item1 11 (11 + 17 + 10) =38
item1 17 (17 + 10 + 22) =49
item1 10 (10 + 22 ) =32
item1 22 (22 ) =22
The problem is that there are a lot of unique values in column col1
of df1
and the interval specified in column col2
in df2
can be different each time. 问题在于,
df1
col1
列中有很多唯一值,并且df2
col2
列中指定的间隔每次都可能不同。
I will be grateful for any help! 我将不胜感激!
Rolling
with apply
Rolling
apply
df1['window']=df1.col1.map(df2.set_index('col1').col2)
df1['col3']=df1.groupby('col1').apply(lambda x : x.col2.sort_index(ascending=False)\
.rolling(window=x.window.values[0],min_periods=1).sum()).reset_index(level='col1',drop=True)
df1
Out[219]:
col1 col2 col3 window
0 item1 14 48.0 3
1 item1 28 50.0 3
2 item1 6 29.0 3
3 item1 16 48.0 3
4 item1 7 43.0 3
5 item1 25 53.0 3
6 item1 11 38.0 3
7 item1 17 49.0 3
8 item1 10 32.0 3
9 item1 22 22.0 3
10 item2 21 95.0 4
11 item2 25 90.0 4
12 item2 24 80.0 4
13 item2 25 82.0 4
14 item2 16 71.0 4
15 item2 15 71.0 4
16 item2 26 86.0 4
17 item2 14 60.0 4
18 item2 16 46.0 4
19 item2 30 30.0 4
Another similar approach with rolling sum, [::-1]
ie rolling sum, [::-1]
另一种类似方法rolling sum, [::-1]
即
df1['new'] = df1['col1'].map(df2.set_index('col1')['col2'])
df1['col3'] = df1.groupby(['col1'])['col2','new'].apply( lambda x : x[['col2']][::-1].rolling(x.new.values[0],min_periods=1).sum()[::-1]).values
Output : 输出:
col1 col2 col3 new 0 item1 14 48.0 3 1 item1 28 50.0 3 2 item1 6 29.0 3 3 item1 16 48.0 3 4 item1 7 43.0 3 5 item1 25 53.0 3 6 item1 11 38.0 3 7 item1 17 49.0 3 8 item1 10 32.0 3 9 item1 22 22.0 3 10 item2 21 95.0 4 11 item2 25 90.0 4 12 item2 24 80.0 4 13 item2 25 82.0 4 14 item2 16 71.0 4 15 item2 15 71.0 4 16 item2 26 86.0 4 17 item2 14 60.0 4 18 item2 16 46.0 4 19 item2 30 30.0 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.