简体   繁体   中英

Imputation on the test set with fancyimpute

The python package Fancyimpute provides several methods for the imputation of missing values in Python. The documentation provides examples such as:

# X is the complete data matrix
# X_incomplete has the same values as X except a subset have been replace with NaN

# Model each feature with missing values as a function of other features, and
# use that estimate for imputation.
X_filled_ii = IterativeImputer().fit_transform(X_incomplete)

This works fine when applying the imputation method to a dataset X . But what if a training/test split is necessary? Once

X_train_filled = IterativeImputer().fit_transform(X_train_incomplete)

is called, how do I impute the test set and create X_test_filled ? The test set needs to be imputed using the information from the training set. I guess that IterativeImputer() should returns and object that can fit X_test_incomplete . Is that possible?

Please note that imputing on the whole dataset and then split into training and test set is not correct .

The package looks like it mimic's scikit-learn's API. And after looking in the source code, it looks like it does have a transform method.

my_imputer = IterativeImputer()
X_trained_filled = my_imputer.fit_transform(X_train_incomplete)

# now transform test
X_test_filled = my_imputer.transform(X_test)

The imputer will apply the same imputations that it learned from the training set.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM