[英]predict() function throws error using factors on linear model in R
I am using the "lung capacity" data set to try to set up a linear model:我正在使用“肺活量”数据集来尝试建立一个线性模型:
library(tidyverse)
library(rvest)
h <- "https://docs.google.com/spreadsheets/d/0BxQfpNgXuWoIWUdZV1ZTc2ZscnM/edit?resourcekey=0-gqXT7Re2eUS2JGt_w1y4vA#gid=1055321634"
t <- rvest::read_html(h)
Nodes <- t %>% html_nodes("table")
table <- html_table(Nodes[[1]])
colnames(table) <- table[1,]
table <- table[-1,]
table <- table %>% select(LungCap, Age, Height, Smoke, Gender, Caesarean)
Lung_Capacity <- table
Lung_Capacity$LungCap <- as.numeric(Lung_Capacity$LungCap)
Lung_Capacity$Age <- as.numeric(Lung_Capacity$Age)
Lung_Capacity$Height <- as.numeric(Lung_Capacity$Height)
Lung_Capacity$Smoke <- as.numeric(Lung_Capacity$Smoke == "yes")
Lung_Capacity$Gender <- as.numeric(Lung_Capacity$Gender == "male")
Lung_Capacity$Caesarean <- as.numeric(Lung_Capacity$Caesarean == "yes")
colnames(Lung_Capacity)[4] <- "Smoker_YN"
colnames(Lung_Capacity)[5] <- "Male_YN"
colnames(Lung_Capacity)[6] <- "Caesarean_YN"
head(Lung_Capacity)
Capacity <- Lung_Capacity
I am splitting the data into a training set and a validation set:我将数据拆分为训练集和验证集:
library(caret)
set.seed(1)
y <- Capacity$LungCap
testIndex <- caret::createDataPartition(y, times = 1, p = 0.2, list = FALSE)
train <- Capacity[-testIndex,]
test <- Capacity[testIndex,]
Cross-validating to obtain my final model:交叉验证以获得我的最终模型:
set.seed(3)
control <- trainControl(method="cv", number = 5)
LinearModel <- train(LungCap ~ ., data = train, method = "lm", trControl = control)
LM <- LinearModel$finalModel
summary(LM)
And trying to run a prediction on the held-out test set:并尝试对保留的测试集进行预测:
lmPredictions <- predict(LM, newdata = test)
However, there is an error thrown that reads:但是,抛出了一个错误,内容如下:
Error in eval(predvars, data, env) : object 'Smoker_YN1' not found
eval(predvars, data, env) 中的错误:找不到对象“Smoker_YN1”
Looking through this site, I thought the column names of the test and train tables may have been off, but that is not the case, they are identical.浏览这个网站,我认为 test 和 train 表的列名可能已经关闭,但事实并非如此,它们是相同的。 The issue seems to be that training the model has renamed the factor predictors "Smoker_YN1" as opposed to the column name "Smokey_YN" that is intended.
问题似乎是训练模型已将因子预测变量重命名为“Smoker_YN1”,而不是预期的列名称“Smokey_YN”。 I tried renaming the column headers in the test set and I tried re-naming the coefficient headers.
我尝试重命名测试集中的列标题,并尝试重命名系数标题。 Neither approach was successful.
这两种方法都没有成功。
I've run out of research and experimental approaches, can anyone please help with this issue?我已经用完了研究和实验方法,有人可以帮忙解决这个问题吗?
I am not sure.我不确定。 Please go through and tell me: My guess (and I am not an expert, is that
LungCap
character and Lung
numeric interfer in this code):请仔细告诉我:我的猜测(我不是专家,
LungCap
字符和Lung
数字是否会干扰此代码):
h <- "https://docs.google.com/spreadsheets/d/0BxQfpNgXuWoIWUdZV1ZTc2ZscnM/edit?resourcekey=0-gqXT7Re2eUS2JGt_w1y4vA#gid=1055321634"
#install.packages("textreadr")
library(textreadr)
library(rvest)
t <- read_html(h)
t
Nodes <- t %>% html_nodes("table")
table <- html_table(Nodes[[1]])
colnames(table) <- table[1,]
table <- table[-1,]
table <- table %>% select(LungCap, Age, Height, Smoke, Gender, Caesarean)
Lung_Capacity <- table
# I changed Lung_Capacity$LungCap <- as.numeric(Lung_Capacity$LungCap) to
Lung_Capacity$Lung <- as.numeric(Lung_Capacity$LungCap)
Lung_Capacity$Age <- as.numeric(Lung_Capacity$Age)
Lung_Capacity$Height <- as.numeric(Lung_Capacity$Height)
Lung_Capacity$Smoke <- as.numeric(Lung_Capacity$Smoke == "yes")
Lung_Capacity$Gender <- as.numeric(Lung_Capacity$Gender == "male")
Lung_Capacity$Caesarean <- as.numeric(Lung_Capacity$Caesarean == "yes")
colnames(Lung_Capacity)[4] <- "Smoker_YN"
colnames(Lung_Capacity)[5] <- "Male_YN"
colnames(Lung_Capacity)[6] <- "Caesarean_YN"
head(Lung_Capacity)
# I changed to
Capacity <- Lung_Capacity
Capacity
library(caret)
set.seed(1)
# I changed y <- Capacity$LungCap to
y <- Capacity$Lung
testIndex <- caret::createDataPartition(y, times = 1, p = 0.2, list = FALSE)
train <- Capacity[-testIndex,]
test <- Capacity[testIndex,]
# I removed
train$LungCap <- NULL
test$LungCap <- NULL
set.seed(3)
control <- trainControl(method="cv", number = 5)
# I changed LungCap to Lung
LinearModel <- train(Lung ~ ., data = train, method = "lm", trControl = control)
LM <- LinearModel$finalModel
summary(LM)
lmPredictions <- predict(LM, newdata = test)
lmPredictions
Output:输出:
1 2 3 4 5 6 7
6.344355 10.231586 4.902900 7.500179 5.295711 9.434454 8.879997
8 9 10 11 12 13 14
12.227635 11.097691 7.775063 8.085810 6.399364 7.852107 9.480219
15 16 17 18 19 20
8.982051 10.115840 7.917863 12.089960 7.838881 9.653292
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.