如何在多类分类中计算不平衡精度度量

Question

很抱歉打扰，但我发现了一篇有趣的文章“Mortaz, E. (2020). Imbalance accuracy metric for model selection in multi-class imbalance classification problems. Knowledge-Based Systems, 210, 106490” ( https://www .sciencedirect.com/science/article/pii/S0950705120306195 ），他们在那里计算了这个度量（IAM）（公式在论文中，我理解了），但我想问：我如何在 R 上复制它？

我提前为这个愚蠢的问题道歉。 感谢您的关注！

Answer 1

文中提供的IAM公式为： IAM公式

其中 cij 是分类器混淆矩阵 (c) 中的元素 (i,j)。 k是指分类中的类数（k>=2）。 结果表明，该度量可以用作多类 model 选择中的单独度量。

下面提供了 python 中实现 IAM（不平衡精度指标）的代码：

def IAM(c):
  '''
  c is a nested list presenting the confusion matrix of the classifier (len(c)>=2)
  '''
  l  = len(c)
  iam = 0

  for i in range(l):
      sum_row = 0
      sum_col = 0
      sum_row_no_i = 0
      sum_col_no_i = 0
      for j in range(l):
          sum_row += c[i][j]
          sum_col += c[j][i]
          if j is not i:
              sum_row_no_i += c[i][j] 
              sum_col_no_i += c[j][i]
      iam += (c[i][i] - max(sum_row_no_i, sum_col_no_i))/max(sum_row, sum_col)
  return   iam/l

c = [[2129,   52,    0,    1],
     [499,   70,    0,    2],
     [46,   16,    0,   1],
     [85,   18,    0,   7]]

IAM(c) = -0.5210576475801445

下面提供了 R 中实现 IAM（不平衡精度指标）的代码：

IAM <- function(c) {

 # c is a matrix representing the confusion matrix of the classifier.

  l <- nrow(c)
  result = 0
  
  for (i in 1:l) {
  sum_row = 0
  sum_col = 0
  sum_row_no_i = 0
  sum_col_no_i = 0

    for (j in 1:l){
          sum_row = sum_row + c[i,j]
          sum_col = sum_col + c[j,i]
          if(i != j)  {
              sum_row_no_i = sum_row_no_i + c[i,j] 
              sum_col_no_i = sum_col_no_i + c[j,i]
          }
    }
    result = result + (c[i,i] - max(sum_row_no_i, sum_col_no_i))/max(sum_row, sum_col)
  }
  return(result/l)
}

c <- matrix(c(2129,52,0,1,499,70,0,2,46,16,0,1,85,18,0,7), nrow=4, ncol=4)

IAM(c) = -0.5210576475801445

虹膜数据集（3 class 问题）和 sklearn 的另一个示例：

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix

X, y = load_iris(return_X_y=True)
clf = LogisticRegression(max_iter = 1000).fit(X, y)
pred = clf.predict(X)
c = confusion_matrix(y, pred)
print('confusion matrix:')
print(c)
print(f'accuarcy : {clf.score(X, y)}')
print(f'IAM : {IAM(c)}')

confusion matrix:
[[50  0  0]
 [ 0 47  3]
 [ 0  1 49]]
accuarcy : 0.97
IAM : 0.92

如何在多类分类中计算不平衡精度度量

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-09-28 18:47:28

如何在多类分类中计算不平衡精度度量

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-09-28 18:47:28

解决方案1
0 已采纳 2021-09-28 18:47:28