[英]'factors with the same levels' in Confusion Matrix
I'm trying to make a decision tree but this error comes up when I make a confusion matrix in the last line:我正在尝试制作决策树,但是当我在最后一行制作混淆矩阵时出现此错误:
Error : `data` and `reference` should be factors with the same levels
Here's my code:这是我的代码:
library(rpart)
library(caret)
library(dplyr)
library(rpart.plot)
library(xlsx)
library(caTools)
library(data.tree)
library(e1071)
#Loading the Excel File
library(readxl)
FINALDATA <- read_excel("Desktop/FINALDATA.xlsm")
View(FINALDATA)
df <- FINALDATA
View(df)
#Selecting the meaningful columns for prediction
#df <- select(df, City, df$`Customer type`, Gender, Quantity, Total, Date, Time, Payment, Rating)
df <- select(df, City, `Customer type`, Gender, Quantity, Total, Date, Time, Payment, Rating)
#making sure the data is in the right format
df <- mutate(df, City= as.character(City), `Customer type`= as.character(`Customer type`), Gender= as.character(Gender), Quantity= as.numeric(Quantity), Total= as.numeric(Total), Time= as.numeric(Time), Payment = as.character(Payment), Rating= as.numeric(Rating))
#Splitting into training and testing data
set.seed(123)
sample = sample.split('Customer type', SplitRatio = .70)
train = subset(df, sample==TRUE)
test = subset(df, sample == FALSE)
#Training the Decision Tree Classifier
tree <- rpart(df$`Customer type` ~., data = train)
#Predictions
tree.customertype.predicted <- predict(tree, test, type= 'class')
#confusion Matrix for evaluating the model
confusionMatrix(tree.customertype.predicted, test$`Customer type`)
So I've tried to do this as said in another topic:因此,我尝试按照另一个主题中所述执行此操作:
confusionMatrix(table(tree.customertype.predicted, test$`Customer type`))
But I still have an error:但我仍然有一个错误:
Error in !all.equal(nrow(data), ncol(data)) : argument type is invalid
Try to keep factor levels of train
and test
same as df
.尽量保持
train
和test
的因子水平与df
相同。
train$`Customer type` <- factor(train$`Customer type`, unique(df$`Customer type`))
test$`Customer type` <- factor(test$`Customer type`, unique(df$`Customer type`))
I made a toy data set and examined your code.我制作了一个玩具数据集并检查了您的代码。 There were a couple issues:
有几个问题:
names(df) <- gsub("Customer type", "Customer_type", names(df))
.names(df) <- gsub("Customer type", "Customer_type", names(df))
。df$Customer_type <- factor(df$Customer_type)
df$Customer_type <- factor(df$Customer_type)
sample.split()
says the first argument 'Y' should be a vector of labels. sample.split()
的文档说第一个参数“Y”应该是标签向量。 But in your code you gave the variable name.levels(df$Customer_type)
.levels(df$Customer_type)
。 Input these to sample.split()
as a character vector.sample.split()
。rpart()
call as shown below.rpart()
调用,如下所示。 With these adjustments, your code might be OK.通过这些调整,您的代码可能没问题。
# toy data
df <- data.frame(City = factor(sample(c("Paris", "Tokyo", "Miami"), 100, replace = T)),
Customer_type = factor(sample(c("High", "Med", "Low"), 100, replace = T)),
Gender = factor(sample(c("Female", "Male"), 100, replace = T)),
Quantity = sample(1:10, 100, replace = T),
Total = sample(1:10, 100, replace = T),
Date = sample(seq(as.Date('2020/01/01'), as.Date('2020/12/31'), by="day"), 100),
Rating = factor(sample(1:5, 100, replace = T)))
library(rpart)
library(caret)
library(dplyr)
library(caTools)
library(data.tree)
library(e1071)
#Splitting into training and testing data
set.seed(123)
sample = sample.split(levels(df$Customer_type), SplitRatio = .70) # ADJUST YOUR CODE TO MATCH YOUR FACTOR LABEL NAMES
train = subset(df, sample==TRUE)
test = subset(df, sample == FALSE)
#Training the Decision Tree Classifier
tree <- rpart(Customer_type ~., data = train) # ADJUST YOUR CODE SO IT'S LIKE THIS
#Predictions
tree.customertype.predicted <- predict(tree, test, type= 'class')
#confusion Matrix for evaluating the model
confusionMatrix(tree.customertype.predicted, test$Customer_type)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.