简体   繁体   中英

Replace all NaN values with value from other column

I have the following dataframe:

df = pd.DataFrame([[np.nan, 2, np.nan, 0],
                   [3, 4, np.nan, 1],
                   [np.nan, np.nan, 5, np.nan],
                   [np.nan, 3, np.nan, 4]],
                  columns=list('ABCD'))

I want to do a ffill() on column B with df["B"].ffill(inplace=True) which results in the following df:

     A    B    C    D
0  NaN  2.0  NaN  0.0
1  3.0  4.0  NaN  1.0
2  NaN  4.0  5.0  NaN
3  NaN  3.0  NaN  4.0

Now I want to replace all NaN values with their corresponding value from column B. The documentation states that you can give fillna() a Series, so I tried df.fillna(df["B"], inplace=True) . This results in the exact same dataframe as above.

However, if I put in a simple value (eg df.fillna(0, inplace=True) , then it does work:

     A    B    C    D
0  0.0  2.0  0.0  0.0
1  3.0  4.0  0.0  1.0
2  0.0  4.0  5.0  0.0
3  0.0  3.0  0.0  4.0

The funny thing is that the fillna() does seem to work with a Series as value parameter when operated on another Series object. For example, df["A"].fillna(df["B"], inplace=True) results in:

     A    B   C  D
0  2.0  2.0 NaN  0
1  3.0  4.0 NaN  1
2  4.0  4.0 NaN  5
3  3.0  3.0 NaN  4

My real dataframe has a lot of columns and I would hate to manually fillna() all of them. Am I overlooking something here? Didn't I understand the docs correctly perhaps?

EDIT I have clarified my example in such a way that 'ffill' with axis=1 does not work for me. In reality, my dataframe has many, many columns (hundreds) and I am looking for a way to not have to explicitly mention all the columns.

Try changing the axis to 1 (columns):

df = df.ffill(1).bfill(1)

If you need to specify the columns, you can do something like this:

df[["B","C"]] = df[["B","C"]].ffill(1)

EDIT: Since you need something more general and df.fillna(df.B, axis = 1) is not implemented yet, you can try with:

df = df.T.fillna(df.B).T

Or, equivalently:

df.T.fillna(df.B, inplace=True)

This works because the indices of df.B coincides with the columns of df.T so pandas will know how to replace it. From the docs:

value: scalar, dict, Series, or DataFrame. Value to use to fill holes (eg 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame) . Values not in the dict/Series/DataFrame will not be filled. This value cannot be a list.

So, for example, the NaN in column 0 at row A (in df.T ) will be replaced for the value with index 0 in df.B .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM