简体   繁体   中英

caret::confusionMatrix on iteration which calculates contingency table, but fails to create confusionMatrix object

Scope:

[Code UPDATE.] To calculate confusionMatrix() sensitivity, specificity, accuracy from a loop, or sequence, of values ranging from seq(0.1,0.9, by=0.1).

Problem on confusionMatrix()

[Code UPDATE.] Using the caret::confusionMatrix function, I have variables to build the confusionMatrix inside the function (compute_seq_accuracy.func). The Try/Catch does not show any errors; BUT, this function does NOT create the confusionMatrix when the caret::confusionMatrix(csa.func.p, csa.func.confusion_table).

This is not the major problem to solve in this code, the other errors have been fixed by me when I change dataset.

Goal:

To iterate values: 0.1 to 0.9, by 0.1, calculating sensitivity, specificity, accuracy from custom coded confusionMatrix function that handles level errors when the caret::confusionMatrix error'd when levels were different.

Null records have been removed.

R Code WIP Solution

This is the R code work in process, function compute_seq_accuracy.func() executes without error, however, now the contingency table that is created inside function compute_confusion_matrix.func(), does not get create and returned from the function return. the following data trace is from an internal print statement which shows the contingency table created for each threshold evaluation:

## function
compute_seq_accuracy.func <- function(value) {
        tryCatch({
                csa.func.p <- factor(ifelse(loans_predict < value, 0, 1))
                csa.func.confusion_table <- compute_confusion_matrix.func(loans_train_data$statusRank, csa.func.p)
                tryCatch({
                        csa.cmt <- compute_matrix.func(csa.func.p, csa.func.confusion_table)
                }, 
                error = function(e) return(e)
                )
                return(csa.cmt$overall['Accuracy']) 
        }, 
        error = function(e) return(e)
        )
}

compute_matrix.func <- function(p, t) {
        tryCatch({
                cm.func.confusion_matrix <- caret::confusionMatrix(p, t)
                return(cm.func.confusion_matrix)   ### $overall['Accuracy'])
        }, 
        error = function(e) return(e)
        )
}

## function
compute_confusion_matrix.func <- function(y, p) {
        ccm.func.confusion_table <- table(y, p)
        if(nrow(ccm.func.confusion_table)!=ncol(ccm.func.confusion_table)){
                missings <- setdiff(colnames(ccm.func.confusion_table),rownames(ccm.func.confusion_table))
                missing_mat <- mat.or.vec(nr = length(missings), nc = ncol(ccm.func.confusion_table))
                ccm.func.confusion_table  <- as.table(rbind(as.matrix(ccm.func.confusion_table), missing_mat))
                rownames(ccm.func.confusion_table) <- colnames(ccm.func.confusion_table)
        }
        return(ccm.func.confusion_table)
}

## process run
compute_for_values = seq(0.1,0.9, by=0.1)
csa_computed_accuracies <- sapply(compute_for_values, compute_seq_accuracy.func, simplify = FALSE)

function returned variable: csa_computed_accuracies, matrix created, error message inside matrix, reads the follow:

> csa_computed_accuracies
[[1]]
<simpleError in dimnames(x) <- dn: length of 'dimnames' [1] not equal to array extent>

Data Trace

Try...Catch is set, no warning messages. However, when line csa.func.confusion_matrix <- caret::confusionMatrix() is invoked, no confusionMatrix object is created. And no Try...Catch error or warning is issued:

> csa_computed_accuracies <- sapply(compute_for_values, compute_seq_accuracy.func, simplify = FALSE)
      p
y          0     1
  Bad      4  6009
  Good     0 21411
      p
y          0     1
  Bad     38  5975
  Good    15 21396
      p
y          0     1
  Bad    225  5788
  Good   133 21278
      p
y          0     1
  Bad    702  5311
  Good   533 20878
      p
y          0     1
  Bad   1575  4438
  Good  1614 19797
      p
y          0     1
  Bad   2836  3177
  Good  4002 17409
      p
y          0     1
  Bad   4382  1631
  Good  8646 12765
      p
y          0     1
  Bad   5627   386
  Good 15856  5555
> 

> csa_computed_accuracies
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
[[4]]
NULL
[[5]]
NULL
[[6]]
NULL
[[7]]
NULL
[[8]]
NULL
[[9]]
NULL

> 

Partial Correction

Corrected Data set:

 head(loans_predict,50)
    11413      2561     25337      1643     14264     24191     33989     28193     21129      7895     29007     26622      3065 
0.8375821 0.7516343 0.8375704 0.7671279 0.7201578 0.7917037 0.8980501 0.8259884 0.8604232 0.8664207 0.7609676 0.7753622 0.9321958 
    11423      3953      5789     30150      6070      1486     13195     30344     26721       716     24609     22196     10770 
0.8325967 0.9459098 0.5903160 0.5997290 0.9045176 0.6782181 0.7546154 0.8381577 0.7943421 0.7198638 0.4522069 0.7129170 0.8632025 
    18042      3710     21750     23492     10680      5088     10434      3228      8696     29688     33847      2997     24772 
0.8941667 0.6445716 0.7659989 0.2616490 0.7402274 0.7115220 0.8985310 0.7300686 0.8737217 0.6712457 0.7037675 0.6868837 0.7534947 
    28396      6825     27619     26433     25542     33853     32926     33585     20362      6895     20634 
0.7516796 0.7261610 0.8437550 0.8662871 0.8620579 0.9355447 0.6786310 0.6017286 0.9340776 0.9022817 0.7832571 
> 
> compute_for_values
[1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Consider wrapping your method in tryCatch to catch exceptions and return NULL on error which you can investigate further which 0.1 causes the error and such NULL elements can be removed with Filter at the end. Below also uses sapply (wrapper to lapply ) which returns a named list if character vector is used as input.

compute_seq_accuracy.func <- function(value) {
     tryCatch({
        p <- factor(ifelse(loans_predict_fcm < as.numeric(value), 'Bad', 'Good')) 
        confusion_table <- compute_confusion_matrix(loans_train_data$statusRank, p) 
        c_matrix <- confusionMatrix(confusion_table) 
        return(c_matrix$overall['Accuracy']) 
     }, 
        # RETURN ERROR MESSAGE
        error = function(e) return(e)
     )
}

compute_for_values <- as.character(seq(0.1, 0.9, by=0.1))

## WIP error in !all.equal(nrow(data, ncol(data))) 
computed_accuracies <- sapply(compute_for_values, compute_seq_accuracy.func, simplify = FALSE)

# REMOVE NULLs FROM LIST
computed_accuracies <- Filter(LENGTH, computed_accuracies)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM