I have dataframe that looks like this:
sepal length sepal width petal length petal width target
0 4.9 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 NaN setosa
...
I've created LinearRegression() model using petal width and petal length. Now I want to use linear_regression model I've trained to fill NaN values.
Here is what I've tried, it works however I am curious to know if there is more efficient way.
def fillna_linear_reg(length, width):
if pd.isna(length):
pred_length = lin_reg.predict([[width]])
return pred_length[0][0]
else:
return length
iris_df["petal length (cm)"] = iris_df.apply(lambda x: fillna_linear_reg(x["petal length (cm)"], x["petal width (cm)"]), axis=1)
Thanks in advance!
Yes, there is a more efficient way. You could use predict and assign all missing values at once. Avoid using df.apply
whenever possible. It kills the performance, especially when used with other vectorizable functions like predict
(or even already vectorized) methods of (I assume so) sklearn
models.
def fillna_linear_reg(lin_reg, length, width):
nan_mask = length.isna()
pred_length = lin_reg.predict(width.loc[nan_mask])
length.loc[nan_mask] = pred_length
fillna_linear_reg(
lin_reg, iris_df.loc[:, "petal length (cm)"], iris_df.loc[:, "petal width (cm)"]
)
Depending on the machine learning module you used for training, you may need to pass the x-data to the predict
method as a 2d-array and squeeze back to a 1d-array. If so, you can replace the line including the prediction with:
pred_length = np.squeeze(lin_reg.predict(np.atleast_2d(width.loc[nan_mask])))
This can be of course simplified, if you add explicit shape information.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.