简体   繁体   English

Python - 用列中的先前记录值填充NULL

[英]Python - Fill NULL with previous record value in a column

import pandas as pd

df = pd.DataFrame([['NewJersy',0,'2020-08-29'],
                   ['NewJersy',12,'2020-08-30'],
                   ['NewJersy',12,'2020-08-31'],
                   ['NewJersy',None,'2020-09-01'],
                   ['NewJersy',None,'2020-09-02'],
                   ['NewJersy',None,'2020-09-03'],
                   ['NewJersy',5,'2020-09-04'],
                   ['NewJersy',5,'2020-09-05'],
                   ['NewJersy',None,'2020-09-06'],
                   ['NewYork',None,'2020-08-29'],
                   ['NewYork',None,'2020-08-30'],
                   ['NewYork',8,'2020-08-31'],
                   ['NewYork',7,'2020-09-01'],
                   ['NewYork',None,'2020-09-02'],
                   ['NewYork',None,'2020-09-03']],
                   columns=['FName', 'FVal', 'GDate'])

print(df)

I want to fill NULL value with previous record values.我想用以前的记录值填充 NULL 值。 For example Column FValue has values NULL for 20-09-01 to 20-09-03.例如,对于 20-09-01 到 20-09-03,列 FValue 的值为 NULL。 The NULL value should be replaced with value 12 taken from previous valid value ie,from 20-08-31. NULL 值应替换为取自先前有效值(即 20-08-31)的值 12。

Also if the value for date 2020-08-29 is null then it should be replaced with zero as it is the first date and it doesn't have previous record.此外,如果日期 2020-08-29 的值为空,则应将其替换为零,因为它是第一个日期并且没有以前的记录。

I tried below code but not working我试过下面的代码但没有工作

df['F'] = df['F'].fillna(method='ffill') df['F'] = df['F'].fillna(method='ffill')

Check for Expected Values here: Fill Null Values image在此处检查预期值:填充空值图像

Thanks谢谢

You should first ensure your DataFrame is sorted along time in case:您应该首先确保您的 DataFrame 是按时间排序的,以防万一:

df = df.sort_values('GDate').reset_index(drop=True)

Then you must fill the first value with a 0:然后你必须用 0 填充第一个值:

if pd.isnull(df.loc[0, 'FVal']):
    df.loc[0, 'FVal'] = df.loc[0, 'FVal']

And then forward fill as you did:然后像你一样向前填充:

df['FVal'] = df['FVal'].fillna(method='ffill')

Note that the column name is FVal not F .请注意,列名是FVal而不是F

Not sure if this is what you want.不确定这是否是您想要的。 But this is what I would do但这就是我要做的

>>> import math
>>> for s in df.iterrows():
...     if math.isnan(s[1][1]):
...        df.iloc[s[0],1] = df.iloc[s[0] - 1,1]
...
>>> df
       FName  FVal       GDate
0   NewJersy   0.0  2020-08-29
1   NewJersy  12.0  2020-08-30
2   NewJersy  12.0  2020-08-31
3   NewJersy  12.0  2020-09-01
4   NewJersy  12.0  2020-09-02
5   NewJersy  12.0  2020-09-03
6   NewJersy   5.0  2020-09-04
7   NewJersy   5.0  2020-09-05
8   NewJersy   5.0  2020-09-06
9    NewYork   5.0  2020-08-29
10   NewYork   5.0  2020-08-30
11   NewYork   8.0  2020-08-31
12   NewYork   7.0  2020-09-01
13   NewYork   7.0  2020-09-02
14   NewYork   7.0  2020-09-03
>>>

You can try this:你可以试试这个:

df.GDate = pd.to_datetime(df.GDate)
for i in range(len(df)):
    if (np.isnan(df.FVal.loc[i])) and (i > 0):
        if (df.GDate.loc[i]-df.GDate.loc[i-1]).days == 1:
            print((df.GDate.loc[i]-df.GDate.loc[i-1]).days)
            df.FVal.loc[i] = df.FVal.loc[i-1]
        else:
            df.FVal.loc[i] = 0


Output输出

    FName       FVal    GDate
0   NewJersy    0.0     2020-08-29
1   NewJersy    12.0    2020-08-30
2   NewJersy    12.0    2020-08-31
3   NewJersy    12.0    2020-09-01
4   NewJersy    12.0    2020-09-02
5   NewJersy    12.0    2020-09-03
6   NewJersy    5.0     2020-09-04
7   NewJersy    5.0     2020-09-05
8   NewJersy    5.0     2020-09-06
9   NewYork     0.0     2020-08-29
10  NewYork     0.0     2020-08-30
11  NewYork     8.0     2020-08-31
12  NewYork     7.0     2020-09-01
13  NewYork     7.0     2020-09-02
14  NewYork     7.0     2020-09-03

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM