简体   繁体   中英

replace nan in pandas dataframe

given the dataframe df

df = pd.DataFrame(data=[[np.nan,1],
                  [np.nan,np.nan],
                  [1,2],
                  [2,3],
                  [np.nan,np.nan],
                  [np.nan,np.nan],
                  [3,4],
                  [4,5],
                  [np.nan,np.nan],
                  [np.nan,np.nan]],columns=['A','B'])


df
Out[16]: 
     A    B
0  NaN  1.0
1  NaN  NaN
2  1.0  2.0
3  2.0  3.0
4  NaN  NaN
5  NaN  NaN
6  3.0  4.0
7  4.0  5.0
8  NaN  NaN
9  NaN  NaN

I would need to replace the nan using the following rules:

1) if nan is at the beginning replace with the first values after the nan

2) if nan is in the middle of 2 or more values replace the nan with the average of these values

3) if nan is at the end replace with the last value

df
Out[16]: 
     A    B
0  1.0  1.0
1  1.0  1.5
2  1.0  2.0
3  2.0  3.0
4  2.5  3.5
5  2.5  3.5
6  3.0  4.0
7  4.0  5.0
8  4.0  5.0
9  4.0  5.0

Use add between forward filling and backfilling values, then divide by 2 and last replace last and first NaN s:

df = df.bfill().add(df.ffill()).div(2).ffill().bfill()
print (df)
     A    B
0  1.0  1.0
1  1.0  1.5
2  1.0  2.0
3  2.0  3.0
4  2.5  3.5
5  2.5  3.5
6  3.0  4.0
7  4.0  5.0
8  4.0  5.0
9  4.0  5.0

Detail :

print (df.bfill().add(df.ffill()))

     A     B
0  NaN   2.0
1  NaN   3.0
2  2.0   4.0
3  4.0   6.0
4  5.0   7.0
5  5.0   7.0
6  6.0   8.0
7  8.0  10.0
8  NaN   NaN
9  NaN   NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM