If I have some missing values and I would like to replace all NaN with average of preceding and succeeding values, how can I do that?.
I know I can use pandas.DataFrame.fillna
with method='ffill'
or method='bfill'
options to replace the NaN values by preceding or succeeding values, however I would like to apply the average of those values on the dataframe instead of iterating over rows and columns.
Try DataFrame.interpolate()
. Example from the panda docs:
In [65]: df = pd.DataFrame({'A': [1, 2.1, np.nan, 4.7, 5.6, 6.8],
....: 'B': [.25, np.nan, np.nan, 4, 12.2, 14.4]})
....:
In [66]: df
Out[66]:
A B
0 1.0 0.25
1 2.1 NaN
2 NaN NaN
3 4.7 4.00
4 5.6 12.20
5 6.8 14.40
In [67]: df.interpolate()
Out[67]:
A B
0 1.0 0.25
1 2.1 1.50
2 3.4 2.75
3 4.7 4.00
4 5.6 12.20
5 6.8 14.40
Maybe late but I just had the same question and the (unique) answer in this page did not satisfy my expectations. That's why I am answering now. Your post states that you want to replace the NaNs
with averages however, the interpolation is not a correct answer for me because it fills the empty cells with a linear equation. If you want to fill it with the averages of the preceding and succeeding rows. This code helped me:
dfb = df.fillna(method='bfill')
dff = df.fillna(method='ffill')
dfmeans = (dfb+dff)/2
dfmeans
For the datafrme of the example above, the result looks like
A B
0 1.0 0.250
1 2.1 2.125
2 3.4 2.125
3 4.7 4.000
4 5.6 12.200
5 6.8 14.400
Where you can see, at index 2 of the column A they both produce 3.4 because there the interpolation is (2.1 + 4.7)/2 but in column B the values differ.
For a one-line script and it's application to time series, you can see this post: Average between values with unevenly distributed time in Pandas DataFrame
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.