简体   繁体   English

xgboost 错误预测 R

[英]xgboost error predictions R

I am using xgboost to predict airbnb destinations (similar to the Kaggle competition but for a class project).我正在使用 xgboost 来预测 airbnb 目的地(类似于 Kaggle 竞赛,但用于课堂项目)。 However when running the prediction command I receive this error message:但是,在运行预测命令时,我收到此错误消息:

Error in predict.xgb.Booster(bst, dval) : Feature names stored in object and newdata are different! predict.xgb.Booster(bst, dval) 中的错误:存储在objectnewdata中的特征名称不同!

How can I fix this problem?我该如何解决这个问题?

Here is my code:这是我的代码:

    setwd("~/Documents/Big Data/Datasets-20180304")
airbnb <- read.csv("airbnb_train.csv", header = T, stringsAsFactors = F)
airbnb_test <- read.csv("airbnb_test.csv", header = T, stringsAsFactors = F)
airbnb <- na.omit(airbnb)
airbnb_test <- na.omit(airbnb_test)
airbnb$country_destination <- as.factor(airbnb$country_destination)

airbnb$country_destination[airbnb$country_destination==0] <- NA
airbnb$country_destination <- recode(airbnb$country_destination, "c('1') = '0'; c('2') = '1'")
airbnb <- na.omit(airbnb)
airbnb_test <- na.omit(airbnb_test)

set.seed(1234)
train_index <- sample(1:nrow(airbnb),size = 0.7*nrow(airbnb),replace = F)
train <- airbnb[train_index,]
validation <- airbnb[-train_index,]

options(na.action='na.pass')
new_tr = sparse.model.matrix(country_destination~.-1,data = train, with = F)
train_label <- train$country_destination
train_label <- as.numeric(train_label)-1
dtrain <- xgb.DMatrix(data = new_tr, label=train_label)


new_val = sparse.model.matrix(country_destination~.-1,data = validation, with = F)
val_label <- validation$country_destination
val_label <- as.numeric(val_label)-1
dval <- xgb.DMatrix(data = new_val, label=val_label)

#default parameters
params <- list(
  booster = "gbtree",
  objective = "binary:logistic",
  eta=0.3,
  gamma=0,
  max_depth=6,
  min_child_weight=1,
  subsample=1,
  colsample_bytree=1
)

bst <- xgboost(data = dtrain, label = train_label, max_depth = 2, eta = 1, nthread = 2, nrounds = 8, objective = "binary:logistic")

xgbpred <- predict(bst,dval)

What am I doing wrong?我究竟做错了什么? How can I ensure that both 'bst' and 'dval' have the same feature_names?如何确保 'bst' 和 'dval' 具有相同的 feature_names?

Can you share what are your names(bst) and names(dval) ?你能分享一下你的names(bst)names(dval)吗? After you applying boosting model:应用提升模型后:

bst <- xgboost(data = dtrain, label = train_label, max_depth = 2, eta = 1, nthread = 2, nrounds = 8, objective = "binary:logistic")

As a workaround you could simply do:作为一种解决方法,您可以简单地执行以下操作:

names(bst) <- names(dval)

and then try your prediction:然后尝试您的预测:

xgbpred <- predict(bst,dval)

I was stuck with similar problem and this worked for me.我遇到了类似的问题,这对我有用。

Try removing predicting variable, ie train$country_destination in your case, from 'dtrain' and 'dtest' (even if you have blank values filled in there).尝试从 'dtrain' 和 'dtest' 中删除预测变量,即 train$country_destination 在您的情况下(即使您在那里填充了空白值)。 Try training the model again after making that change.进行更改后再次尝试训练模型。

If you look at this page ( https://rdrr.io/cran/xgboost/src/R/xgb.Booster.R ), you will see that some R users are likely to get the following error message: "Feature names stored in object and newdata are different!".如果您查看此页面 ( https://rdrr.io/cran/xgboost/src/R/xgb.Booster.R ),您会看到一些 R 用户可能会收到以下错误消息:“功能名称已存储在objectnewdata是不同的!”。

Here is the code from this page related to the error message:这是此页面中与错误消息相关的代码:

predict.xgb.Booster <- function(object, newdata, missing = NA, outputmargin = FALSE, ntreelimit = NULL,predleaf = FALSE, predcontrib = FALSE, approxcontrib = FALSE, predinteraction = FALSE,reshape = FALSE, ...)

object <- xgb.Booster.complete(object, saveraw = FALSE)
      if (!inherits(newdata, "xgb.DMatrix"))
        newdata <- xgb.DMatrix(newdata, missing = missing)
      if (!is.null(object[["feature_names"]]) &&
          !is.null(colnames(newdata)) &&
          !identical(object[["feature_names"]], colnames(newdata)))
        stop("Feature names stored in `object` and `newdata` are different!")

identical(object[["feature_names"]], colnames(newdata)) => If the column names of object (ie your model based on your training set) are not identical to the column names of newdata (ie your test set), you will get the error message. identical(object[["feature_names"]], colnames(newdata)) =>如果列名object (即,基于你的训练集的模型)是不相同的列名newdata (即测试集),你会收到错误信息。

For more details:更多细节:

train_matrix <- xgb.DMatrix(as.matrix(training %>% select(-target)), label = training$target, missing = NaN)
object <- xgb.train(data=train_matrix, params=..., nthread=2, nrounds=..., prediction = T)
newdata <- xgb.DMatrix(as.matrix(test %>% select(-target)), missing = NaN)

While setting by yourself object and newdata with your data thanks to the code above, you can probably fix this issue by looking at the differences between object[["feature_names"]] and colnames(newdata) .由于上面的代码,您可以通过自己的数据设置objectnewdata ,您可以通过查看object[["feature_names"]]colnames(newdata)之间的差异来解决这个问题。 Probably some columns that don't appear in the same order or something.可能有些列的顺序不一样或什么的。

Following guiotan answers, using遵循guiotan答案,使用

library(dplyr)

You should be able to write:你应该能够写:

xgbpred <- predict(bst, dval %>% select(bst$feature_names))

If you trained xgboost using caret , a solution would be to write the following.如果您使用caret训练 xgboost,解决方案是编写以下内容。

xgbpred <- predict(bst, dval %>% select(bst$finalModel$feature_names))

At least this worked for me.至少这对我有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM