I have a 900 x 7 dataframe in which 3 fields contain some NaN values.
Instead of simply replacing these values with the some feature average, I have created a function to use an algorithm to estimate the likely value of each NaN based on the other values in that row.
How can I iterate over each NaN to change it's value using my custom function?
My function takes the row ID, the other feature names, and the feature containing the NaN as arguments.
Eg
custom_fillnan(id=0, ins=["val0", "val1", "val2"], out="valn")
Example dataframe:
ID val0 val1 val2 ... valn
0 1 2 3 ... NaN
1 1 NaN 3 ... 4
2 0 0 NaN ... 1
...
IIUC you could use apply
with axis=1
and fillna
with your custom function:
In [80]: df
Out[80]:
ID val0 val1 val2 valn
0 0 1 2 3 NaN
1 1 1 NaN 3 4
2 2 0 0 NaN 1
In [83]: df.apply(lambda x: x.fillna(pd.np.mean(x.iloc[1:])), axis=1)
Out[83]:
ID val0 val1 val2 valn
0 0 1 2.000000 3.000000 2
1 1 1 2.666667 3.000000 4
2 2 0 0.000000 0.333333 1
Instead of pd.np.mean
you could use your function. x.iloc[1:]
is used because, as I understand, you want to use for your function only val
columns.
EDIT
If you want to get column names for missing values you could apply or use that function for processing:
def func(x):
x.loc[x.isnull()] = x.index[x.isnull()]
return x
In [209]: df.apply(func, axis=1)
Out[209]:
ID val0 val1 val2 valn
0 0 1 2 3 valn
1 1 1 val1 3 4
2 2 0 0 val2 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.