简体   繁体   中英

Individually replace NaN in pandas.dataframe

I have a 900 x 7 dataframe in which 3 fields contain some NaN values.

Instead of simply replacing these values with the some feature average, I have created a function to use an algorithm to estimate the likely value of each NaN based on the other values in that row.

How can I iterate over each NaN to change it's value using my custom function?

My function takes the row ID, the other feature names, and the feature containing the NaN as arguments.

Eg

custom_fillnan(id=0, ins=["val0", "val1", "val2"], out="valn")

Example dataframe:

ID    val0    val1    val2    ...    valn
0      1        2       3     ...    NaN
1      1      NaN       3     ...     4
2      0        0     NaN     ...     1
...

IIUC you could use apply with axis=1 and fillna with your custom function:

In [80]: df
Out[80]: 
   ID  val0  val1  val2  valn
0   0     1     2     3   NaN
1   1     1   NaN     3     4
2   2     0     0   NaN     1


In [83]: df.apply(lambda x: x.fillna(pd.np.mean(x.iloc[1:])), axis=1)
Out[83]: 
   ID  val0      val1      val2  valn
0   0     1  2.000000  3.000000     2
1   1     1  2.666667  3.000000     4
2   2     0  0.000000  0.333333     1

Instead of pd.np.mean you could use your function. x.iloc[1:] is used because, as I understand, you want to use for your function only val columns.

EDIT

If you want to get column names for missing values you could apply or use that function for processing:

def func(x):
    x.loc[x.isnull()] = x.index[x.isnull()]
    return x

In [209]: df.apply(func, axis=1)
Out[209]: 
   ID  val0  val1  val2  valn
0   0     1     2     3  valn
1   1     1  val1     3     4
2   2     0     0  val2     1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM