简体   繁体   中英

How to delete the first and last rows with NaN of a dataframe and replace the remaining NaN with the average of the values below and above?

Let's take this dataframe as a simple example:

df = pd.DataFrame(dict(Col1=[np.nan,1,1,2,3,8,7], Col2=[1,1,np.nan,np.nan,3,np.nan,4], Col3=[1,1,np.nan,5,1,1,np.nan]))

   Col1  Col2  Col3
0   NaN   1.0   1.0
1   1.0   1.0   1.0
2   1.0   NaN   NaN
3   2.0   NaN   5.0
4   3.0   3.0   1.0
5   8.0   NaN   1.0
6   7.0   4.0   NaN

I would like first to remove first and last rows until there is no longer NaN in the first and last row.

Intermediate expected output:

   Col1  Col2  Col3
1   1.0   1.0   1.0
2   1.0   NaN   NaN
3   2.0   NaN   5.0
4   3.0   3.0   1.0

Then, I would like to replace the remaining NaN by the mean of the nearest value below which is not a NaN, and the one above.

Final expected output:

   Col1  Col2  Col3
0   1.0   1.0   1.0
1   1.0   2.0   3.0
2   2.0   2.0   5.0
3   3.0   3.0   1.0

I know I can have the positions of NaN in my dataframe through

df.isna()

But I can't solve my problem. How please could I do?

My approach:

# identify the rows with some NaN
s = df.notnull().all(1)

# remove those with NaN at beginning and at the end:
new_df = df.loc[s.idxmax():s[::-1].idxmax()]

# average:
new_df = (new_df.ffill()+ new_df.bfill())/2

Output:

   Col1  Col2  Col3
1   1.0   1.0   1.0
2   1.0   2.0   3.0
3   2.0   2.0   5.0
4   3.0   3.0   1.0

Another option would be to use DataFrame.interpolate with round :

nans = df.notna().all(axis=1).cumsum().drop_duplicates()
low, high = nans.idxmin(), nans.idxmax()

df.loc[low+1: high].interpolate().round()

   Col1  Col2  Col3
1   1.0   1.0   1.0
2   1.0   2.0   3.0
3   2.0   2.0   5.0
4   3.0   3.0   1.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM