简体   繁体   English

在使用居中的.rolling()和第一个计算得出的和之后,替换Pandas DataFrame列中的NaN值

[英]Replace NaN values in a Pandas DataFrame column after using centered .rolling() with first computed sum

I'm fairly new to Pandas and this is also my first actual question Stackoverflow, so please bear with me. 我是Pandas的新手,这也是我的第一个实际问题Stackoverflow,请耐心等待。

I'm transforming a DataFrame with a MultiIndex. 我正在使用MultiIndex转换DataFrame。 I have to calculate a moving sum of five observations each and doing it centered. 我必须计算五个观察值的移动总和,并将其居中。 I've done that while using groupby, such that the rolling sum is calculated within each group, which is gender, age, and type grouped. 我在使用groupby时已经做到了这一点,这样就可以在每个组中计算出滚动总和,即性别,年龄和分组类型。 However, that means the first and last two rows within each group are NaN. 但是,这意味着每组中的前两行均为NaN。 I want the first two NaN values to be equal to the third and the last two to be equal to the 3rd last. 我希望前两个NaN值等于第三个,而后两个等于第3个。

This is original DataFrame 这是原始的DataFrame

    Gender    Type   Age    Value
1   'f'       A      1       654
2   'f'       A      2       665
3   'f'       A      3       684
4   'f'       A      4       688
5   'f'       A      5       651
6   'f'       A      6       650
7   'f'       A      7       698
8   'f'       A      8       689
9   'f'       A      9       648
10  'f'       A      10      654
11  'f'       B      1       623
12  'f'       B      2       620
13  'f'       B      3       623
14  'f'       B      4       653
15  'f'       B      5       653
16  'f'       B      6       642
17  'f'       B      7       632
18  'f'       B      8       632
19  'f'       B      9       644
20  'f'       B      10      654
21  'm'       A      1       623
22  'm'       A      2       624
23  'm'       A      3       600
24  'm'       A      4       642
25  'm'       A      5       622
26  'm'       A      6       623
27  'm'       A      7       633
28  'm'       A      8       635
29  'm'       A      9       653
30  'm'       A      10      623
31  'm'       B      1       623
32  'm'       B      2       632
33  'm'       B      3       632
34  'm'       B      4       683
35  'm'       B      5       652
36  'm'       B      6       655
37  'm'       B      7       691
38  'm'       B      8       684
39  'm'       B      9       645
40  'm'       B      10      624

This is the code I use for computing the rolling sum. 这是我用于计算滚动总和的代码。

df=df.reset_index().set_index(['Age'])
df=df.groupby(['Gender','Type'])['Value'].rolling(window=5,center=True).sum().reset_index()

That computes this: 计算如下:


    Gender    Type   Age    Value
1   'f'       A      1       NaN
2   'f'       A      2       NaN
3   'f'       A      3       3342
4   'f'       A      4       3338
5   'f'       A      5       3371
6   'f'       A      6       3376
7   'f'       A      7       3336
8   'f'       A      8       3339
9   'f'       A      9       NaN
10  'f'       A      10      NaN
11  'f'       B      1       NaN
12  'f'       B      2       NaN
13  'f'       B      3       3172
14  'f'       B      4       3191
15  'f'       B      5       3203
16  'f'       B      6       3212
17  'f'       B      7       3203
18  'f'       B      8       3204
19  'f'       B      9       NaN
20  'f'       B      10      NaN
21  'm'       A      1       NaN
22  'm'       A      2       NaN
23  'm'       A      3       x1
24  'm'       A      4       x2
25  'm'       A      5       x3
26  'm'       A      6       x4
27  'm'       A      7       x5
28  'm'       A      8       x7
29  'm'       A      9       NaN
30  'm'       A      10      NaN
31  'm'       B      1       NaN
32  'm'       B      2       NaN
33  'm'       B      3       x8
34  'm'       B      4       x9
35  'm'       B      5       x10
36  'm'       B      6       x11
37  'm'       B      7       x12
38  'm'       B      8       x13
39  'm'       B      9       NaN
40  'm'       B      10      NaN

The x's are just replacement for the rolling sums. x只是滚动总和的替代。

Now my problem. 现在我的问题。 I want to replace the NaN values with specific cells within each group. 我想将NaN值替换为每组中的特定单元格。 Specifically, the rolling sum for 1 and 2 years in each group must be equal to that of 3 years. 具体来说,每组1年和2年的滚动总和必须等于3年。 As 3 year row might also be NaN due to not meing computable, I can't use a code that just extrapolates forward and backwards a bfill or hfill. 由于3年行由于不可计算而也可能是NaN,因此我无法使用仅向前和向后推断填充或填充的代码。 If 3 year-row is NaN I want the for 1 year and 2 year also within the group. 如果3年是NaN,我也希望小组中的1年和2年。

So the following result, is want I want: 因此,以下结果是我想要的:

    Gender    Type   Age    Value
1   'f'       A      1       3342
2   'f'       A      2       3342
3   'f'       A      3       3342
4   'f'       A      4       3338
5   'f'       A      5       3371
6   'f'       A      6       3376
7   'f'       A      7       3336
8   'f'       A      8       3339
9   'f'       A      9       3339
10  'f'       A      10      3339
11  'f'       B      1       3172
12  'f'       B      2       3172
13  'f'       B      3       3172
14  'f'       B      4       3191
15  'f'       B      5       3203
16  'f'       B      6       3212
17  'f'       B      7       3203
18  'f'       B      8       3204
19  'f'       B      9       3204
20  'f'       B      10      3204
21  'm'       A      1       x1
22  'm'       A      2       x1
23  'm'       A      3       x1
24  'm'       A      4       x2
25  'm'       A      5       x3
26  'm'       A      6       x4
27  'm'       A      7       x5
28  'm'       A      8       x7
29  'm'       A      9       x7
30  'm'       A      10      x7
31  'm'       B      1       x8
32  'm'       B      2       x8
33  'm'       B      3       x8
34  'm'       B      4       x9
35  'm'       B      5       x10
36  'm'       B      6       x11
37  'm'       B      7       x12
38  'm'       B      8       x13
39  'm'       B      9       x13
40  'm'       B      10      x13

I really hope, that one of you could help me. 我真的希望你们中的一个能帮助我。 Thanks in advance. 提前致谢。

After your initial groupby with rolling.sum , try groupby.transform with a customer def : 在使用rolling.sum初始groupby rolling.sum ,尝试使用customer def groupby.transform

Setup 设定

Make year 3 NaN for first group for testing 第一组进行NaN 3年级测试

df.loc[2, 'Value'] = np.nan

print(df)

   Gender Type  Age   Value
0     'f'    A    1     NaN
1     'f'    A    2     NaN
2     'f'    A    3     NaN
3     'f'    A    4  3338.0
4     'f'    A    5  3371.0
5     'f'    A    6  3376.0
6     'f'    A    7  3336.0
7     'f'    A    8  3339.0
8     'f'    A    9     NaN
9     'f'    A   10     NaN
10    'f'    B    1     NaN
...

Solution

def custom_rolling_fillna(arr):
    arr.iloc[:2] = arr.iloc[2]
    arr.iloc[-2:] = arr.iloc[-3]
    return arr

df['Value'] = df.groupby(['Gender', 'Type'])['Value'].transform(custom_rolling_fillna)

print(df)

   Gender Type  Age   Value
0     'f'    A    1     NaN
1     'f'    A    2     NaN
2     'f'    A    3     NaN
3     'f'    A    4  3338.0
4     'f'    A    5  3371.0
5     'f'    A    6  3376.0
6     'f'    A    7  3336.0
7     'f'    A    8  3339.0
8     'f'    A    9  3339.0
9     'f'    A   10  3339.0
10    'f'    B    1  3172.0
...

Alternative, you could do this in one step using: 或者,您可以使用以下步骤一步一步完成:

def custom_rolling_fillna(arr):
    rolling = arr.rolling(window=5,center=True).sum()
    rolling.iloc[:2] = arr.iloc[2]
    rolling.iloc[-2:] = arr.iloc[-3]    
    return rolling


df['Value'] = df.groupby(['Gender', 'Type'])['Value'].transform(custom_rolling_fillna)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM