简体   繁体   English

R 语言中的一种 Class 分类。 生成混淆矩阵时我做错了什么?

[英]One Class Classification in R language. What am I doing wrong when generating the confusion matrix?

I am trying to understand and implement classifiers A class in R is based on several UCIs and one of them ( http://archive.ics.uci.edu/ml/datasets/Chronic_Kidney_Disease ).我正在尝试理解和实现分类器 R 中的 class 是基于几个 UCI 的,其中一个( Z80791B3AE7002CB88C246876D9FAA8F8F8_Disease

When trying to print a confusion matrix you are giving the error “all arguments must have the same length”.尝试打印混淆矩阵时,您会给出错误“所有 arguments 必须具有相同的长度”。

What am I doing wrong?我究竟做错了什么?

library(caret)
library(dplyr)
library(e1071)
library(NLP)
library(tm)

ds = read.csv('kidney_disease.csv', 
              header = TRUE)

#Remover colunas inutiliz?veis              
ds <- subset(ds, select = -c(age), classification =='ckd' )

x <- subset(ds, select = -classification) #make x variables
y <- ds$classification #make y variable(dependent)

# test on the whole set
#pred <- predict(model, subset(ds, select=-classification))


trainPositive<-x
testnegative<-y

inTrain<-createDataPartition(1:nrow(trainPositive),p=0.6,list=FALSE)

trainpredictors<-trainPositive[inTrain,1:4]
trainLabels<-trainPositive[inTrain,6]

testPositive<-trainPositive[-inTrain,]
testPosNeg<-rbind(testPositive,testnegative)

testpredictors<-testPosNeg[,1:4]
testLabels<-testPosNeg[,6]

svm.model<-svm(trainpredictors,y=NULL,
               type='one-classification',
               nu=0.10,
               scale=TRUE,
               kernel="radial")

svm.predtrain<-predict(svm.model,trainpredictors)
svm.predtest<-predict(svm.model,testpredictors)

# confusionMatrixTable<-table(Predicted=svm.pred,Reference=testLabels)
# confusionMatrix(confusionMatrixTable,positive='TRUE')

confTrain <- table(Predicted=svm.predtrain,Reference=trainLabels)
confTest <- table(Predicted=svm.predtest,Reference=testLabels)

confusionMatrix(confTest,positive='TRUE')


print(confTrain)
print(confTest)

#grid

Here are some of the first lines of the dataset I'm using:以下是我正在使用的数据集的一些第一行:

 id bp    sg al su    rbc       pc        pcc         ba bgr bu  sc sod pot hemo pcv   wc
1  0 80 1.020  1  0          normal notpresent notpresent 121 36 1.2  NA  NA 15.4  44 7800
2  1 50 1.020  4  0          normal notpresent notpresent  NA 18 0.8  NA  NA 11.3  38 6000
3  2 80 1.010  2  3 normal   normal notpresent notpresent 423 53 1.8  NA  NA  9.6  31 7500
4  3 70 1.005  4  0 normal abnormal    present notpresent 117 56 3.8 111 2.5 11.2  32 6700
5  4 80 1.010  2  0 normal   normal notpresent notpresent 106 26 1.4  NA  NA 11.6  35 7300
6  5 90 1.015  3  0                 notpresent notpresent  74 25 1.1 142 3.2 12.2  39 7800
   rc htn  dm cad appet  pe ane classification
1 5.2 yes yes  no  good  no  no            ckd
2      no  no  no  good  no  no            ckd
3      no yes  no  poor  no yes            ckd
4 3.9 yes  no  no  poor yes yes            ckd
5 4.6  no  no  no  good  no  no            ckd
6 4.4 yes yes  no  good yes  no            ckd

The error log:错误日志:

> confTrain <- table (Predicted = svm.predtrain, Reference = trainLabels)
Table error (Predicted = svm.predtrain, Reference = trainLabels):
all arguments must be the same length
> confTest <- table (Predicted = svm.predtest, Reference = testLabels)
Table error (expected = svm.predtest, reference = testLabels):
all arguments must be the same length
>
> confusionMatrix (confTest, positive = 'TRUE')
ConfusionMatrix error (confTest, positive = "TRUE"):
'confTest' object not found
>
>
> print (confTrain)
Printing error (confTrain): object 'confTrain' not found
> print (confTest)
Printing error (confTest): object 'confTest' not found


I see a number of issues.我看到了很多问题。 First it seems that a lot of your data is of class character rather than numeric, which is required by the classifier.首先,您的很多数据似乎都是 class 字符而不是分类器所需的数字。 Let's pick some columns and convert to numeric.让我们选择一些列并转换为数字。 I will use data.table because fread is very convenient.我将使用data.table因为fread非常方便。

library(caret)
library(e1071)
library(data.table)
setDT(ds)
#Choose columns
mycols <- c("id","bp","sg","al","su")
#Convert to numeric
ds[,(mycols) := lapply(.SD, as.numeric),.SDcols = mycols]

#Convert classification to logical
data <- ds[,.(bp,sg,al,su,classification = ds$classification == "ckd")]
data
     bp    sg al su classification
  1: 80 1.020  1  0           TRUE
  2: 50 1.020  4  0           TRUE
  3: 80 1.010  2  3           TRUE
  4: 70 1.005  4  0           TRUE
  5: 80 1.010  2  0           TRUE
 ---                              
396: 80 1.020  0  0          FALSE
397: 70 1.025  0  0          FALSE
398: 80 1.020  0  0          FALSE
399: 60 1.025  0  0          FALSE
400: 80 1.025  0  0          FALSE

Once the data is cleaned up, you can sample a training and test set with createDataPartition as in your original code.清理数据后,您可以像在原始代码中一样使用createDataPartition对训练和测试集进行采样。

#Sample data for training and test set
inTrain<-createDataPartition(1:nrow(data),p=0.6,list=FALSE)
train<- data[inTrain,]
test <- data[-inTrain,]

Then we can create the model and make the predictions.然后我们可以创建 model 并进行预测。

svm.model<-svm(classification ~ bp + sg + al + su, data = train,
               type='one-classification',
               nu=0.10,
               scale=TRUE,
               kernel="radial")

#Perform predictions 
svm.predtrain<-predict(svm.model,train)
svm.predtest<-predict(svm.model,test)

Your main issue with the cross table was that the model can only predict for cases that don't have any NA s, so you have to subset the classification levels to those with predictions.您对交叉表的主要问题是 model 只能预测没有任何NA的案例,因此您必须将分类级别子集到具有预测的级别。 Then you can evaluate confusionMatrix :然后你可以评估confusionMatrix

confTrain <- table(Predicted=svm.predtrain,
                   Reference=train$classification[as.integer(names(svm.predtrain))])
confTest <- table(Predicted=svm.predtest,
                  Reference=test$classification[as.integer(names(svm.predtest))])

confusionMatrix(confTest,positive='TRUE')

Confusion Matrix and Statistics

         Reference
Predicted FALSE TRUE
    FALSE     0   17
    TRUE     55   64

               Accuracy : 0.4706         
                 95% CI : (0.3845, 0.558)
    No Information Rate : 0.5956         
    P-Value [Acc > NIR] : 0.9988         

                  Kappa : -0.2361        

 Mcnemar's Test P-Value : 1.298e-05      

            Sensitivity : 0.7901         
            Specificity : 0.0000         
         Pos Pred Value : 0.5378         
         Neg Pred Value : 0.0000         
             Prevalence : 0.5956         
         Detection Rate : 0.4706         
   Detection Prevalence : 0.8750         
      Balanced Accuracy : 0.3951         

       'Positive' Class : TRUE           

Data数据

library(archive)
library(data.table)
tf1 <- tempfile(fileext = ".rar")
#Download data file
download.file("http://archive.ics.uci.edu/ml/machine-learning-databases/00336/Chronic_Kidney_Disease.rar", tf1)
tf2 <- tempfile()
#Un-rar file
archive_extract(tf1, tf2)
#Read in data
ds <- fread(paste0(tf2,"/Chronic_Kidney_Disease/chronic_kidney_disease.arff"), fill = TRUE, skip = "48")
#Remove erroneous last column
ds[,V26:= NULL]
#Set column names (from header)
setnames(ds,c("id","bp","sg","al","su","rbc","pc","pcc","ba","bgr","bu","sc","sod","pot","hemo","pcv","wc","rc","htn","dm","cad","appet","pe","ane","classification"))
#Replace "?" with NA
ds[ds == "?"] <- NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我无法使用 R 中的一个 Class 生成分类的混淆矩阵 - I am not able to generate the confusion matrix of a classification with One Class in R 当我尝试为随机森林模型制作混淆矩阵时出错 - 我做错了什么? - Getting an error when I try to make a Confusion Matrix for a Random Forest model- what am I doing wrong? 使用 spatstat 进行点模式分类:我做错了什么? - Point pattern classification with spatstat: what am I doing wrong? R rowSums() 正在生成一个奇怪的 Output。 我究竟做错了什么? - R rowSums() Is Generating a Strange Output. What Am I Doing Wrong? 我正在尝试使用 R 获取矩阵中列的乘积。 我究竟做错了什么? - I'm trying to take the product of the columns in a matrix using R. What am I doing wrong? 我正在尝试在R中编写一个函数,该函数使用svm进行分类以找到混淆矩阵。 - I am trying to write a function in R that finds the confusion matrix using a svm for classification. 如果当我在 R 中合并两个数据框时,只合并了其中的一部分,我做错了什么? - What am I doing wrong if when I merge two data frames in R, only parts of them are merged? R中的快速傅立叶变换。我在做什么错? - Fast Fourier Transform in R. What am I doing wrong? 我在做什么错(data.table,R)? - What am I doing wrong (data.table, R)? dplyr (R) 中的 stderr:我做错了什么? - stderr in dplyr (R): What am I doing wrong?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM