import pandas as pd
df = pd.DataFrame([['NewJersy',0,'2020-08-29'],
['NewJersy',12,'2020-08-30'],
['NewJersy',12,'2020-08-31'],
['NewJersy',None,'2020-09-01'],
['NewJersy',None,'2020-09-02'],
['NewJersy',None,'2020-09-03'],
['NewJersy',5,'2020-09-04'],
['NewJersy',5,'2020-09-05'],
['NewJersy',None,'2020-09-06'],
['NewYork',None,'2020-08-29'],
['NewYork',None,'2020-08-30'],
['NewYork',8,'2020-08-31'],
['NewYork',7,'2020-09-01'],
['NewYork',None,'2020-09-02'],
['NewYork',None,'2020-09-03']],
columns=['FName', 'FVal', 'GDate'])
print(df)
I want to fill NULL value with previous record values. For example Column FValue has values NULL for 20-09-01 to 20-09-03. The NULL value should be replaced with value 12 taken from previous valid value ie,from 20-08-31.
Also if the value for date 2020-08-29 is null then it should be replaced with zero as it is the first date and it doesn't have previous record.
I tried below code but not working
df['F'] = df['F'].fillna(method='ffill')
Check for Expected Values here: Fill Null Values image
Thanks
You should first ensure your DataFrame is sorted along time in case:
df = df.sort_values('GDate').reset_index(drop=True)
Then you must fill the first value with a 0:
if pd.isnull(df.loc[0, 'FVal']):
df.loc[0, 'FVal'] = df.loc[0, 'FVal']
And then forward fill as you did:
df['FVal'] = df['FVal'].fillna(method='ffill')
Note that the column name is FVal
not F
.
Not sure if this is what you want. But this is what I would do
>>> import math
>>> for s in df.iterrows():
... if math.isnan(s[1][1]):
... df.iloc[s[0],1] = df.iloc[s[0] - 1,1]
...
>>> df
FName FVal GDate
0 NewJersy 0.0 2020-08-29
1 NewJersy 12.0 2020-08-30
2 NewJersy 12.0 2020-08-31
3 NewJersy 12.0 2020-09-01
4 NewJersy 12.0 2020-09-02
5 NewJersy 12.0 2020-09-03
6 NewJersy 5.0 2020-09-04
7 NewJersy 5.0 2020-09-05
8 NewJersy 5.0 2020-09-06
9 NewYork 5.0 2020-08-29
10 NewYork 5.0 2020-08-30
11 NewYork 8.0 2020-08-31
12 NewYork 7.0 2020-09-01
13 NewYork 7.0 2020-09-02
14 NewYork 7.0 2020-09-03
>>>
You can try this:
df.GDate = pd.to_datetime(df.GDate)
for i in range(len(df)):
if (np.isnan(df.FVal.loc[i])) and (i > 0):
if (df.GDate.loc[i]-df.GDate.loc[i-1]).days == 1:
print((df.GDate.loc[i]-df.GDate.loc[i-1]).days)
df.FVal.loc[i] = df.FVal.loc[i-1]
else:
df.FVal.loc[i] = 0
Output
FName FVal GDate
0 NewJersy 0.0 2020-08-29
1 NewJersy 12.0 2020-08-30
2 NewJersy 12.0 2020-08-31
3 NewJersy 12.0 2020-09-01
4 NewJersy 12.0 2020-09-02
5 NewJersy 12.0 2020-09-03
6 NewJersy 5.0 2020-09-04
7 NewJersy 5.0 2020-09-05
8 NewJersy 5.0 2020-09-06
9 NewYork 0.0 2020-08-29
10 NewYork 0.0 2020-08-30
11 NewYork 8.0 2020-08-31
12 NewYork 7.0 2020-09-01
13 NewYork 7.0 2020-09-02
14 NewYork 7.0 2020-09-03
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.