简体   繁体   中英

Error in `$<-.data.frame`(`*tmp*`, Predict, value = c(`1` = 1L, `2` = 1L, : replacement has 3500 rows, data has 1500

Good Day

I have run a Random Forest with tuning and have added the prediction to the Train data which ran perfectly well and had no issues. However when I tried running the random forest model on the Test dataset I get the above error. Any idea as to what could be causing this below is my code. Appreciate any help with this. The Train dataset does have 3500 rows and the Test would have 1500 rows as the dataset is made of 5000 rows.

R CODE:

####Clearing the global environmnent
rm(list = ls())

##Setting the working directory
setwd("D:/Great Learning/Module 3 -Machine Learning/Project")


##Packages required to be loaded
install.packages("DataExplorer")
install.packages("xlsx")
##install.packages("magrittr")
install.packages("dplyr")
install.packages("tidyverse")
install.packages("mice")
install.packages("NbClust")

##Reading in the dataset
library(xlsx)
LoanModelRaw = read.xlsx("Thera Bank_Personal_Loan_Modelling-dataset- 1.xlsx",sheetName = "Bank_Personal_Loan_Modelling",header = T)
##LoanModelRaw = read.csv("Thera Bank_Personal_Loan_Modelling-dataset-1.csv", sep = ";",header = T)

##Viewing the dataset in R
View(LoanModelRaw)
dim(LoanModelRaw)
colnames(LoanModelRaw)
str(LoanModelRaw)
summary(LoanModelRaw)
nrow(LoanModelRaw)
attach(LoanModelRaw)

#Correcting column names
names(LoanModelRaw)[2] = "AgeInYears" 
names(LoanModelRaw)[3] = "ExperienceInYears"
names(LoanModelRaw)[4] = "IncomeInKMonth"
names(LoanModelRaw)[5] = "ZIPCode"
names(LoanModelRaw)[6] = "FamilyMembers"
names(LoanModelRaw)[10] = "PersonalLoan"
names(LoanModelRaw)[11] = "SecuritiesAccount"
names(LoanModelRaw)[12] = "CDAccount" 

colnames(LoanModelRaw)

#############################################################1 EDA of the data#######################################################

library(DataExplorer)
##introduce(LoanModelRaw)
plot_intro(LoanModelRaw)
plot_missing(LoanModelRaw)
##plot_bar(LoanModelRaw)
plot_histogram(LoanModelRaw)
create_report(LoanModelRaw)

?plot_boxplot

#Missing Value Treatment
library(mice)
sum(is.na(LoanModelRaw))
md.pattern(LoanModelRaw)
LoanModelRawImpute = mice(LoanModelRaw, m =5, method = 'pmm', seed = 1000)
LoanModelRawNoNa = complete(LoanModelRawImpute, 3)
md.pattern(LoanModelRawNoNa)

#Correcting negative experience
LoanModel = abs(LoanModelRawNoNa[2:14])
attach(LoanModel)
#View(LoanModel)
#summary(LoanModel)
#nrow(LoanModel)
#
AgeInYears  ExperienceInYears   IncomeInKMonth  ZIPCode FamilyMembers   CCAvg   Education
25  1   49  91107   4   1.6 1
45  19  34  90089   3   1.5 1
39  15  11  94720   1   1.0 1
35  9   100 94112   1   2.7 2
35  8   45  91330   4   1.0 2
37  13  29  92121   4   0.4 2

Mortgage    PersonalLoan    SecuritiesAccount   CDAccount   Online  CreditCard  Split
0   0   1   0   0   0   FALSE
0   0   1   0   0   0   FALSE
0   0   0   0   0   0   TRUE
0   0   0   0   0   0   TRUE
0   0   0   0   0   1   TRUE
155 0   0   0   1   0   TRUE

** LoanModelTest$Predict = predict(LoanModelTrainRefinedRF,data= LoanModelTest, type = "class") ** LoanModelTest$Score = predict(LoanModelTrainRefinedRF,data= LoanModelTest, type = "prob")

 AgeInYears ExperienceInYears IncomeInKMonth ZIPCode FamilyMembers CCAvg Education 25 1 49 91107 4 1.6 1 45 19 34 90089 3 1.5 1 39 15 11 94720 1 1.0 1 35 9 100 94112 1 2.7 2 35 8 45 91330 4 1.0 2 37 13 29 92121 4 0.4 2 Mortgage PersonalLoan SecuritiesAccount CDAccount Online CreditCard Split 0 0 1 0 0 0 FALSE 0 0 1 0 0 0 FALSE 0 0 0 0 0 0 TRUE 0 0 0 0 0 0 TRUE 0 0 0 0 0 1 TRUE 155 0 0 0 1 0 TRUE

I got the same error trying to predict a single outcome from a simple glm model. In the model I specified the outcome and predictors using the format "dataset$outcome", etc. In the "test" set (really just one row of observations, I named the columns "outcome" etc. If I remove the $s from the model and instead specify "data=dataset", then the error disapears. So perhaps it's an issue with how objects are being called.

This error means that you try to append a column vector of length 3500 to a matrix that has 1500 rows. Of course, this does not work because R does not automatically create ǸA for the empty rows (and that is a good thing).

Try to check the dimensions (number of rows and number of columns) of LoanModelTest and LoanModelTrain . Also, check the return dimensions of the predict functions.

I hope this helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM