[英]random forest error NA not permitted in predictors
Hi I am using the following r script to build a random forest: 嗨,我正在使用以下r脚本来构建随机森林:
# load the necessary libraries
library(randomForest)
testPP<-numeric()
# load the dataset
QdataTrain <- read.csv('train.csv',header = FALSE)
QdataTest <- read.csv('test.csv',header = FALSE)
QdataTrainX <- subset(QdataTrain,select=-V1)
QdataTrainY<-as.factor(QdataTrain$V1)
QdataTestX <- subset(QdataTest,select=-V1)
QdataTestY<-as.factor(QdataTest$V1)
mdl <- randomForest(QdataTrainX, QdataTrainY)
where I am getting the following error: 我收到以下错误:
Error in randomForest.default(QdataTrainX, QdataTrainY) :
NA not permitted in predictors
however i see no occurence of NA in my data. 但是我看不到数据中不存在NA。
for reference here is my data: 供参考的是我的数据:
https://docs.google.com/file/d/0B0iDswLYaZ0zUFFsT01BYlRZU0E/edit
does anyone know why this error is being thrown? 有谁知道为什么会引发此错误? I'll keep looking in the mean time.
在此期间,我将继续寻找。 Thanks in advance for any help!
在此先感谢您的帮助!
The given data does contain some missing values (7 in particular): 给定的数据确实包含一些缺失值(尤其是7):
sapply(QdataTrainX, function(x) sum(is.na(x)))
## V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
## 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29
## 0 0 0 0 0 0 1 1 1 1 1 1 1
Therefore columns V23 to V29 have one missing value each 因此,列V23至V29每个都有一个缺失值
which(is.na(QdataTrainX$V23))
## 318
Gives the row number for that. 给出行号。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.