简体   繁体   中英

Python: How to update only NaN values in pandas.DataFrame?

I have got two data frames.

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [400, np.nan, 600]})
>>> print(df)
   A      B
0  1  400.0
1  2    NaN
2  3  600.0

and

>>> new_df = pd.DataFrame({'B': [4, 5, 6], 'C': [7, 8, 9]})
>>> print(new_df)
   B  C
0  4  7
1  5  8
2  6  9

How can I update df by new_df to fill NaN values? I would like to get following:

>>> print(df)
   A      B
0  1  400.0
1  2    5.0
2  3  600.0

I think you are looking for this:

df.fillna(new_df)
import numpy as np
df['B']  = np.where(df['B'].isnull(), new_df['B'], df['B'])

One way of doing this is using .update

df.update(new_df, overwrite = False)
df.head()
#output:
    A   B
0   1   400.0
1   2   5.0
2   3   600.0

Runtime

%%timeit 
df = pd.DataFrame({'A': [1, 2, 3] * 1000, 'B': [400, np.nan, 600] * 1000})
new_df = pd.DataFrame({'B': [4, 5, 6] * 1000, 'C': [7, 8, 9] * 1000})
df.update(new_df, overwrite = False)

4.24 ms ± 48.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit 
df = pd.DataFrame({'A': [1, 2, 3] * 1000, 'B': [400, np.nan, 600] * 1000})
new_df = pd.DataFrame({'B': [4, 5, 6] * 1000, 'C': [7, 8, 9] * 1000})
df.fillna(new_df)

6.78 ms ± 229 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit 
df = pd.DataFrame({'A': [1, 2, 3] * 1000, 'B': [400, np.nan, 600] * 1000})
new_df = pd.DataFrame({'B': [4, 5, 6] * 1000, 'C': [7, 8, 9] * 1000})
df['B']  = np.where(df['B'].isnull(), new_df['B'], df['B'])

3.91 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM