简体   繁体   English

混淆矩阵中的“具有相同水平的因素”

[英]'factors with the same levels' in Confusion Matrix

I'm trying to make a decision tree but this error comes up when I make a confusion matrix in the last line:我正在尝试制作决策树,但是当我在最后一行制作混淆矩阵时出现此错误:

Error : `data` and `reference` should be factors with the same levels

Here's my code:这是我的代码:

library(rpart)
library(caret)
library(dplyr)
library(rpart.plot)
library(xlsx)
library(caTools)
library(data.tree)
library(e1071)

#Loading the Excel File
library(readxl)
FINALDATA <- read_excel("Desktop/FINALDATA.xlsm")
View(FINALDATA)
df <- FINALDATA
View(df)

#Selecting the meaningful columns for prediction
#df <- select(df, City, df$`Customer type`, Gender, Quantity, Total, Date, Time, Payment, Rating)
df <- select(df, City, `Customer type`, Gender, Quantity, Total, Date, Time, Payment, Rating)

#making sure the data is in the right format 
df <- mutate(df, City= as.character(City), `Customer type`= as.character(`Customer type`), Gender= as.character(Gender), Quantity= as.numeric(Quantity), Total= as.numeric(Total), Time= as.numeric(Time), Payment = as.character(Payment), Rating= as.numeric(Rating))

#Splitting into training and testing data
set.seed(123)
sample = sample.split('Customer type', SplitRatio = .70)
train = subset(df, sample==TRUE)
test = subset(df, sample == FALSE)

#Training the Decision Tree Classifier
tree <- rpart(df$`Customer type` ~., data = train)

#Predictions
tree.customertype.predicted <- predict(tree, test, type= 'class')

#confusion Matrix for evaluating the model
confusionMatrix(tree.customertype.predicted, test$`Customer type`)

So I've tried to do this as said in another topic:因此,我尝试按照另一个主题中所述执行此操作:

confusionMatrix(table(tree.customertype.predicted, test$`Customer type`))

But I still have an error:但我仍然有一个错误:

Error in !all.equal(nrow(data), ncol(data)) : argument type is invalid

Try to keep factor levels of train and test same as df .尽量保持traintest的因子水平与df相同。

train$`Customer type` <- factor(train$`Customer type`, unique(df$`Customer type`))
test$`Customer type` <- factor(test$`Customer type`, unique(df$`Customer type`))

I made a toy data set and examined your code.我制作了一个玩具数据集并检查了您的代码。 There were a couple issues:有几个问题:

  1. R has a easier time with variable names that follow a certain style. R 可以更轻松地使用遵循特定样式的变量名称。 Your 'Customer type' variable has a space in it.您的“客户类型”变量中有一个空格。 In general, coding is easier when you avoid spaces.通常,避免空格时,编码会更容易。 So I renamed it 'Customer_type". For your data.frame you could simply go into the source file, or use names(df) <- gsub("Customer type", "Customer_type", names(df)) .所以我将它重命名为“Customer_type”。对于您的 data.frame,您可以简单地将 go 放入源文件中,或者使用names(df) <- gsub("Customer type", "Customer_type", names(df))
  2. I coded 'Customer_type' as a factor.我将“Customer_type”编码为一个因素。 For you this will look like df$Customer_type <- factor(df$Customer_type)对你来说,这看起来像df$Customer_type <- factor(df$Customer_type)
  3. The documentation for sample.split() says the first argument 'Y' should be a vector of labels. sample.split()的文档说第一个参数“Y”应该是标签向量。 But in your code you gave the variable name.但是在您的代码中,您给出了变量名称。 The labels are the names of the levels of the factor.标签是因子水平的名称。 In my example these levels are High, Med and Low.在我的示例中,这些级别是高、中和低。 To see the levels of your variable you could use levels(df$Customer_type) .要查看变量的级别,您可以使用levels(df$Customer_type) Input these to sample.split() as a character vector.将这些作为字符向量输入到sample.split()
  4. Adjust the rpart() call as shown below.调整rpart()调用,如下所示。

With these adjustments, your code might be OK.通过这些调整,您的代码可能没问题。

# toy data
df <- data.frame(City = factor(sample(c("Paris", "Tokyo", "Miami"), 100, replace = T)),
                 Customer_type = factor(sample(c("High", "Med", "Low"), 100, replace = T)),
                 Gender = factor(sample(c("Female", "Male"), 100, replace = T)),
                 Quantity = sample(1:10, 100, replace = T),
                 Total = sample(1:10, 100, replace = T),
                 Date = sample(seq(as.Date('2020/01/01'), as.Date('2020/12/31'), by="day"), 100),
                 Rating = factor(sample(1:5, 100, replace = T)))

library(rpart)
library(caret)
library(dplyr)
library(caTools)
library(data.tree)
library(e1071)

#Splitting into training and testing data
set.seed(123)
sample = sample.split(levels(df$Customer_type), SplitRatio = .70) # ADJUST YOUR CODE TO MATCH YOUR FACTOR LABEL NAMES
train = subset(df, sample==TRUE)
test = subset(df, sample == FALSE)

#Training the Decision Tree Classifier
tree <- rpart(Customer_type ~., data = train) # ADJUST YOUR CODE SO IT'S LIKE THIS

#Predictions
tree.customertype.predicted <- predict(tree, test, type= 'class')

#confusion Matrix for evaluating the model
confusionMatrix(tree.customertype.predicted, test$Customer_type)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 混淆矩阵错误:错误:`data`和`reference`应该是具有相同水平的因子 - Confusion Matrix Error: Error: `data` and `reference` should be factors with the same levels Adaboost:混淆矩阵的问题 - `data` 和 `reference` 应该是具有相同水平的因素 - Adaboost: Problem with confusion matrix - `data` and `reference` should be factors with the same levels 混淆矩阵错误:数据和参考因素必须具有相同的水平数 - Error in Confusion Matrix : the data and reference factors must have the same number of levels 错误:`data` 和 `reference` 应该是相同级别的因子。 Logistic 回归的混淆矩阵 - Error: `data` and `reference` should be factors with the same levels. Confusion matrix for Logistic Regression R:RF模型中的混淆矩阵返回错误:数据和“参考”应该是具有相同水平的因子 - R: Confusion matrix in RF model returns error: data` and `reference` should be factors with the same levels 错误:`data` 和 `reference` 应该是具有相同级别的因子&#39;不返回混淆矩阵 - Error: `data` and `reference` should be factors with the same levels' doesn't return confusion matrix 并排在r中绘制具有相同水平的因子 - side by side plotting in r for factors with same levels 应该是具有相同水平,误差和参考的因素 - should be factors with the same levels, error and reference 我有三个具有某些共同点的因素:如何在所有因素中将相同数字的相等水平改变? - I have three factors with some levels in common: how to change equal levels for the same numbers in all factors? 什么地方出了错? 错误:`data` 和 `reference` 应该是具有相同水平的因素 - What went wrong? Error: `data` and `reference` should be factors with the same levels
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM