简体   繁体   中英

How to use predict for linear regression using grouped independent variables in R?

I made a model for linear regression using categorical variable which indicates names of cities

library(Rcpp)
library(lme4)

area_csv <- read.csv2('Data/Area.csv')
#area_csv$Value <- as.numeric(area_csv$Value)

py <- read.csv2('Data/Predict_data.csv')

obyem_per_capita_model <- lmList(Value ~ APP * Population | City, data = area_csv)
summary(obyem_per_capita_model)

r_squareds <- summary(obyem_per_capita_model)$r.squared
predictions <- predict(obyem_per_capita_model,newdata = py, asList = TRUE)

#
write.csv2(predictions,'Vvodimoe_Predictions.csv')

but when I try to use predict with the new data set which contains all necessary independent variables I get this error:

Error in predict.lmList4(obyem_per_capita_model, newdata = py, asList = TRUE) : 
  nonexistent group in 'newdata'

Columns in area_csv look like this:
City | Year | Info | Value | Status | Population | APP

Columns in py look like this:
City | Year | Population | APP

I tried a check which was suggested by Roland:

all(py$City %in% area_csv$City)

And it returned FALSE. Thank you, Roland: :D

And then I used setdiff to find the difference:

setdiff(py$City, area_csv$City)

Then, I corrected the typo (removed the difference ie nonexistent level in predict data) and it worked: :D Hurray: :D Thank you everyone! :D

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM