繁体   English   中英

混淆矩阵中的“具有相同水平的因素”

[英]'factors with the same levels' in Confusion Matrix

我正在尝试制作决策树,但是当我在最后一行制作混淆矩阵时出现此错误:

Error : `data` and `reference` should be factors with the same levels

这是我的代码:

library(rpart)
library(caret)
library(dplyr)
library(rpart.plot)
library(xlsx)
library(caTools)
library(data.tree)
library(e1071)

#Loading the Excel File
library(readxl)
FINALDATA <- read_excel("Desktop/FINALDATA.xlsm")
View(FINALDATA)
df <- FINALDATA
View(df)

#Selecting the meaningful columns for prediction
#df <- select(df, City, df$`Customer type`, Gender, Quantity, Total, Date, Time, Payment, Rating)
df <- select(df, City, `Customer type`, Gender, Quantity, Total, Date, Time, Payment, Rating)

#making sure the data is in the right format 
df <- mutate(df, City= as.character(City), `Customer type`= as.character(`Customer type`), Gender= as.character(Gender), Quantity= as.numeric(Quantity), Total= as.numeric(Total), Time= as.numeric(Time), Payment = as.character(Payment), Rating= as.numeric(Rating))

#Splitting into training and testing data
set.seed(123)
sample = sample.split('Customer type', SplitRatio = .70)
train = subset(df, sample==TRUE)
test = subset(df, sample == FALSE)

#Training the Decision Tree Classifier
tree <- rpart(df$`Customer type` ~., data = train)

#Predictions
tree.customertype.predicted <- predict(tree, test, type= 'class')

#confusion Matrix for evaluating the model
confusionMatrix(tree.customertype.predicted, test$`Customer type`)

因此,我尝试按照另一个主题中所述执行此操作:

confusionMatrix(table(tree.customertype.predicted, test$`Customer type`))

但我仍然有一个错误:

Error in !all.equal(nrow(data), ncol(data)) : argument type is invalid

尽量保持traintest的因子水平与df相同。

train$`Customer type` <- factor(train$`Customer type`, unique(df$`Customer type`))
test$`Customer type` <- factor(test$`Customer type`, unique(df$`Customer type`))

我制作了一个玩具数据集并检查了您的代码。 有几个问题:

  1. R 可以更轻松地使用遵循特定样式的变量名称。 您的“客户类型”变量中有一个空格。 通常,避免空格时,编码会更容易。 所以我将它重命名为“Customer_type”。对于您的 data.frame,您可以简单地将 go 放入源文件中,或者使用names(df) <- gsub("Customer type", "Customer_type", names(df))
  2. 我将“Customer_type”编码为一个因素。 对你来说,这看起来像df$Customer_type <- factor(df$Customer_type)
  3. sample.split()的文档说第一个参数“Y”应该是标签向量。 但是在您的代码中,您给出了变量名称。 标签是因子水平的名称。 在我的示例中,这些级别是高、中和低。 要查看变量的级别,您可以使用levels(df$Customer_type) 将这些作为字符向量输入到sample.split()
  4. 调整rpart()调用,如下所示。

通过这些调整,您的代码可能没问题。

# toy data
df <- data.frame(City = factor(sample(c("Paris", "Tokyo", "Miami"), 100, replace = T)),
                 Customer_type = factor(sample(c("High", "Med", "Low"), 100, replace = T)),
                 Gender = factor(sample(c("Female", "Male"), 100, replace = T)),
                 Quantity = sample(1:10, 100, replace = T),
                 Total = sample(1:10, 100, replace = T),
                 Date = sample(seq(as.Date('2020/01/01'), as.Date('2020/12/31'), by="day"), 100),
                 Rating = factor(sample(1:5, 100, replace = T)))

library(rpart)
library(caret)
library(dplyr)
library(caTools)
library(data.tree)
library(e1071)

#Splitting into training and testing data
set.seed(123)
sample = sample.split(levels(df$Customer_type), SplitRatio = .70) # ADJUST YOUR CODE TO MATCH YOUR FACTOR LABEL NAMES
train = subset(df, sample==TRUE)
test = subset(df, sample == FALSE)

#Training the Decision Tree Classifier
tree <- rpart(Customer_type ~., data = train) # ADJUST YOUR CODE SO IT'S LIKE THIS

#Predictions
tree.customertype.predicted <- predict(tree, test, type= 'class')

#confusion Matrix for evaluating the model
confusionMatrix(tree.customertype.predicted, test$Customer_type)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM