简体   繁体   中英

Using predict for linear model with NA values in R

I have a dataset of ~32,000, for which I have created a linear model. ~12,000 observations were deleted due to missingness.

I am trying to use the predict function to backtest the expected value for each of my 32,000 data points, but [as expected], this gives the error 'replacement has 20000 rows, data has 32000'.

  1. Is there any way I can use that model made on the 20,000 rows to predict that of the 32,000? I am happy to have 'zero' for observations that don't have results for every column used in the model.
  2. If not, how can I at least subset the 32,000 dataset correctly such that it only includes the 20,000 whole observations? If my model was lm(a ~ x+y+Z, data=data), for example, how would I filter data to only include observations with full data in x, y and z?

The best thing to do is to use na.action=na.exclude when you fit the model in the first place: from ?na.exclude ,

when 'na.exclude' is used the residuals and predictions are padded to the correct length by inserting 'NA's for cases omitted by 'na.exclude'.

Using

data[complete.cases(data),]

gives you only observations without NA s. Perhaps that's what you are looking for.

Other way is

na.omit(data)

which gives you in addition the indices of the removed observations.

The problem with using a 0 instead of a missing value is that thee linear model will interpret the value as actually having been 0 instead of missing. For instance, if your variable x had a range of 10-100, the model would interpret your imputed 0's as observations lower than the training data's range and give you artificially low predictions. If you want to make a prediction for the rows with missing values, you're going to have to do some value imputation (ie. replace the NAs with the mean, the median or using k-nearest neighbors).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM