简体   繁体   English

应该是具有相同水平,误差和参考的因素

[英]should be factors with the same levels, error and reference

I have this code (below) and need to use CARET and split the data set in 40% of all data in the dataset should be in trainset, the rest in testset; 我有下面的代码,需要使用CARET并将数据集拆分为数据集中所有数据的40%,这些数据集应该在trainset中,其余的在testset中; the payment variable should be distributed equally across the split but the code of the confusionmatrixline gives an error which says: 付款变量应在拆分之间平均分配,但是confusionmatrixline的代码给出了一个错误,指出:

"Error: data and reference should be factors with the same levels." “错误:数据和参考应该是具有相同水平的因素。”

EDIT: the payment variable is a binominal variable so 0 (no) and 1 (yes). 编辑:付款变量是一个二项式变量,所以0(否)和1(是)。 gdp are just numbers GDP只是数字

Sample dataset: (don't now how to make a table here yet) 样本数据集:(现在不在这里如何制作表格)

payment    gdp 

0          838493

1         9303032

0          72738 

1        38300022

1         283283

How to fix this?? 如何解决这个问题?

My code: 我的代码:

 `index <- createDataPartition(y = dataset$payment, p = 0.40, list = F)
 trainset <- dataset[index, ]
 testset <- dataset[-index, ]

payment_knn <- train(payment ~ gdp, method = "knn", data = trainset, 
trControl = trainControl(method = 'cv', number = 5))
predicted_outcomes <- predict(payment_knn, testset)
conMX_pay <- confusionMatrix(predicted_outcomes, testset$payment) 
conMX_pay `

This is purely for illustration purposes. 这纯粹是出于说明目的。 Make sure test data is the same as train data. 确保测试数据与训练数据相同。

df<-df %>% 
  mutate(payment=as.factor(payment),gdp=as.numeric(gdp))
metric<-"Accuracy"
control<-trainControl(method="cv",number = 10)
train_set<-createDataPartition(df$payment,p=0.8,list=F)
valid_me<-df[-train_set,]
train_me<-df[train_set,]
#Training
  set.seed(233)       

fit.knn<-train(payment~.,method="knn",data=train_me,metric=metric,trControl=control)
    validated<-predict(fit.knn,valid_me)
    confusionMatrix(validated,valid_me$payment)

This works fine given the data in your question. 给定您问题中的数据,此方法效果很好。 Warnings because the data set is too small. 警告,因为数据集太小。 Purely for illustration. 纯粹用于说明。 Data Used: 使用的数据:

payment      gdp
1       0   838493
2       1  9303032
3       0    72738
4       1 38300022
5       1   283283

Cheers! 干杯!

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 混淆矩阵错误:错误:`data`和`reference`应该是具有相同水平的因子 - Confusion Matrix Error: Error: `data` and `reference` should be factors with the same levels 什么地方出了错? 错误:`data` 和 `reference` 应该是具有相同水平的因素 - What went wrong? Error: `data` and `reference` should be factors with the same levels ConfusionMatrix 错误:`data` 和 `reference` 应该是具有相同水平的因素 - ConfusionMatrix Error: `data` and `reference` should be factors with the same levels r - 错误:`data` 和 `reference` 应该是具有相同水平的因素 - r - Error: `data` and `reference` should be factors with the same levels 使用混淆矩阵`data`和`reference`的错误应该是具有相同水平的因素 - error using confusionMatrix `data` and `reference` should be factors with the same levels confusionMatrix - 错误:`data` 和 `reference` 应该是具有相同水平的因素 - confusionMatrix - Error: `data` and `reference` should be factors with the same levels 错误:`data` 和 `reference` 应该是相同级别的因子。 Logistic 回归的混淆矩阵 - Error: `data` and `reference` should be factors with the same levels. Confusion matrix for Logistic Regression 错误:`data` 和 `reference` 应该是具有相同水平的因素。 使用混淆矩阵(插入符号) - Error: `data` and `reference` should be factors with the same levels. Using confusionMatrix (caret) R:RF模型中的混淆矩阵返回错误:数据和“参考”应该是具有相同水平的因子 - R: Confusion matrix in RF model returns error: data` and `reference` should be factors with the same levels 错误:`data` 和 `reference` 应该是具有相同级别的因子&#39;不返回混淆矩阵 - Error: `data` and `reference` should be factors with the same levels' doesn't return confusion matrix
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM