Keras 和 Scikit-learn 的加权精度度量之间的差异

Question

Intro介绍

Hej everyone,大家嘿嘿

I am working on my diploma thesis and I face a binary classification problem with imbalanced class contribution.我正在写我的毕业论文，我面临一个类别贡献不平衡的二元分类问题。 I have around 10 times more negative ("0") labels as positive ("1") labels.我的负面（“0”）标签大约是正面（“1”）标签的 10 倍。 For that reason I considered not only observing accuracy and ROC-AUC, but also weighted/ balanced accuracy and Precision-Recall-AUC.出于这个原因，我不仅考虑了观察精度和 ROC-AUC，还考虑了加权/平衡精度和 Precision-Recall-AUC。

I already asked the question on GitHub ( https://github.com/keras-team/keras/issues/12991 ) but the issue has not been answered yet so I thought this platform here might be the better place!我已经在 GitHub ( https://github.com/keras-team/keras/issues/12991 ) 上问过这个问题，但问题还没有得到解答，所以我认为这里的这个平台可能是更好的地方！

Issue description问题描述

During some calculations on the validation set in a custom callback I noticed, more or less by coincidence, that the weighted accuracy is always different from my results using sklearn.metrics.accuracy_score() .在自定义回调中对验证集进行一些计算时，我或多或少偶然地注意到，加权精度总是与我使用sklearn.metrics.accuracy_score() 的结果不同。

Using Keras, weighted accuracy has to be declared in model.compile() and is a key in the logs{} dictionary after every epoch (and is also written to the log file by the CSVLogger callback or to the history object) or is returned as value in a list by model.evaluate() ,使用 Keras，加权精度必须在model.compile() 中声明，并且是每个 epoch 之后 logs{} 字典中的一个键（并且还通过 CSVLogger 回调写入日志文件或历史对象）或返回作为model.evaluate()列表中的值，

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'], 
              weighted_metrics=['accuracy'])

I calculate the val_sample_weights vector based on the class contribution of the training set with the Sklearn.metrics function class_weight.compute_sample_weight() and with the help of class_weight.compute_class_weight() .我使用 Sklearn.metrics 函数class_weight.compute_sample_weight()并在class_weight.compute_class_weight()的帮助下，根据训练集的类别贡献计算 val_sample_weights 向量。

cls_weights = class_weight.compute_class_weight('balanced', np.unique(y_train._values), 
                                                y_train._values)
cls_weight_dict = {0: cls_weights[0], 1: cls_weights[1]}
val_sample_weights = class_weight.compute_sample_weight(cls_weight_dict, y_test._values)

In model.fit() I pass this vector togehter with the validation data and to sklearn.metrics.accuracy_score() I pass it to the parameter name sample_weight to compare the results on the same basis.在model.fit() 中，我将这个向量与验证数据一起传递给sklearn.metrics.accuracy_score()我将它传递给参数名称sample_weight以在相同的基础上比较结果。

model_output = model.fit(x_train, y_train, epochs=500, batch_size=32, verbose=1,
                         validation_data=(x_test, y_test, val_sample_weights))

Furthermore, I derived the equation how Scitkit-learn computes the weighted accuracy from several easy examples and it seems that it's computed by the following equation (which seems quite reasonable to me):此外，我从几个简单的例子中推导出了 Scitkit-learn 如何计算加权准确度的方程，它似乎是通过以下方程计算的（这对我来说似乎很合理）：

LaTeX equation 乳胶方程

TP, TN, FP and FN are the values reported in the confusion matrix and w_p and w_n are the class weights of the positive and negative class respectively. TP、TN、FP 和 FN 是混淆矩阵中报告的值，w_p 和 w_n 分别是正类和负类的类权重。

An easy example to test it can be found here:可以在此处找到一个简单的测试示例：

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.balanced_accuracy_score.html https://scikit-learn.org/stable/modules/generated/sklearn.metrics.balanced_accuracy_score.html

Just for the sake of completeness, sklearn.metrics.accuracy_score(..., sample_weight=) returns the same result as sklearn.metrics.balanced_accuracy_score() .只是为了完整性， sklearn.metrics.accuracy_score(..., sample_weight=)返回与sklearn.metrics.balanced_accuracy_score()相同的结果。

System Information系统信息

GeForce RTX 2080 Ti GeForce RTX 2080 Ti
Keras 2.2.4凯拉斯 2.2.4
Tensorflow-gpu 1.13.1 Tensorflow-GPU 1.13.1
Sklearn 0.19.2 sklearn 0.19.2
Python 3.6.8蟒蛇 3.6.8
CUDA Version 10.0.130 CUDA 版本 10.0.130

Code example代码示例

I searched an easy example to make the issue easy to reproduce, even if the class imbalance here is weaker (1:2 not 1:10).我搜索了一个简单的例子来使问题易于重现，即使这里的类不平衡较弱（1:2 不是 1:10）。 It's based on the introductory tutorial to Keras which can be found here:它基于 Keras 的介绍性教程，可在此处找到：

https://towardsdatascience.com/k-as-in-keras-simple-classification-model-a9d2d23d5b5a https://towardsdatascience.com/k-as-in-keras-simple-classification-model-a9d2d23d5b5a

The Pima Indianas onset diabets dataset will be downloaded, as done in the link above, from the repository of Jason Brownlee, the maker of the homepage Machine Learning Mastery.皮马印第安纳州发病糖尿病数据集将按照上面的链接从主页 Machine Learning Mastery 的创建者 Jason Brownlee 的存储库中下载。 But I guess it can also be downloaded from various other sites.但我想它也可以从其他各种网站下载。

So finally here's the code:所以最后这里的代码：

from keras.layers import Dense, Dropout
from keras.models import Sequential
from keras.regularizers import l2
import pandas as pd
import numpy as np
from sklearn.utils import class_weight
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

file = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/' \
       'pima-indians-diabetes.data.csv'

# Load csv data from file to data using pandas
data = pd.read_csv(file, names=['pregnancies', 'glucose', 'diastolic', 'triceps', 'insulin',
                                'bmi', 'dpf', 'age', 'diabetes'])

# Process data
data.head()
x = data.drop(columns=['diabetes'])
y = data['diabetes']

# Split into train and test
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=0)

# define a sequential model
model = Sequential()
# 1st hidden layer
model.add(Dense(100, activation='relu', input_dim=8, kernel_regularizer=l2(0.01)))
model.add(Dropout(0.3))
# 2nd hidden layer
model.add(Dense(100, activation='relu', kernel_regularizer=l2(0.01)))
model.add(Dropout(0.3))
# Output layer
model.add(Dense(1, activation='sigmoid'))
# Compilation with weighted metrics
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'], 
                         weighted_metrics=['accuracy'])

# Calculate validation _sample_weights_ based on the class distribution of train labels and 
# apply it to test labels using Sklearn
cls_weights = class_weight.compute_class_weight('balanced', np.unique(y_train._values), 
                                                y_train._values)
cls_weight_dict = {0: cls_weights[0], 1: cls_weights[1]}
val_sample_weights = class_weight.compute_sample_weight(cls_weight_dict, y_test._values)

# Train model
model_output = model.fit(x_train, y_train, epochs=500, batch_size=32, verbose=1,
                         validation_data=(x_test, y_test, val_sample_weights))

# Predict model
y_pred = model.predict(x_test, batch_size=32, verbose=1)

# Classify predictions based on threshold at 0.5
y_pred_binary = (y_pred > 0.5) * 1

# Sklearn metrics
sklearn_accuracy = accuracy_score(y_test, y_pred_binary)
sklearn_weighted_accuracy = accuracy_score(y_test, y_pred_binary, 
                                           sample_weight=val_sample_weights)

# metric_list has 3 entries: [0] val_loss weighted by val_sample_weights, [1] val_accuracy 
# [2] val_weighted_accuracy
metric_list = model.evaluate(x_test, y_test, batch_size=32, verbose=1, 
                             sample_weight=val_sample_weights)

print('sklearn_accuracy=%.3f' %sklearn_accuracy)
print('sklearn_weighted_accuracy=%.3f' %sklearn_weighted_accuracy)
print('keras_evaluate_accuracy=%.3f' %metric_list[1])
print('keras_evaluate_weighted_accuracy=%.3f' %metric_list[2])

Results and summary结果和总结

For example I get:例如我得到：

sklearn_accuracy=0.792

sklearn_weighted_accuracy=0.718

keras_evaluate_accuracy=0.792

keras_evaluate_weighted_accuracy=0.712

The "unweighted" accuracy value is the same, both for Sklearn as for Keras. “未加权”的准确度值是相同的，对于 Sklearn 和 Keras 都是一样的。 The difference isn't really big, but it grows bigger as the dataset becomes more imbalanced.差异并不是很大，但随着数据集变得更加不平衡，差异会变得更大。 For example for my task it always differs around 5% from each other!例如，对于我的任务，它总是彼此相差 5% 左右！

Maybe I'm missing something and it's supposed to be like that, but anyways it's confusing that Keras and Sklearn provide different values, especially thinking of the whole class_weights and sample_weights thing as a topic hard to get into.也许我遗漏了一些东西，它应该是这样的，但无论如何，Keras 和 Sklearn 提供不同的值令人困惑，尤其是将整个 class_weights 和 sample_weights 视为一个难以进入的话题。 Unfortunately I'm not too deep into Keras to search in the Keras code on my own.不幸的是，我对 Keras 不太深入，无法自己搜索 Keras 代码。

I would really appreciate receiving any answers!我真的很感激收到任何答案！

Answer 1

I repeated your exact toy example and actually found that sklearn and keras do give the same results.我重复了您的确切玩具示例，实际上发现sklearn和keras确实给出了相同的结果。 I repeated the experiment 5 times to ensure it wasn't by chance and indeed the results were identical each time.我重复了 5 次实验，以确保它不是偶然的，而且每次的结果都是相同的。 For one of the runs for example:例如，对于其中一次运行：

sklearn_accuracy=0.831
sklearn_weighted_accuracy=0.800
keras_evaluate_accuracy=0.831
keras_evaluate_weighted_accuracy=0.800

FYI I'm using sklearn and keras versions:仅供参考，我正在使用sklearn和keras版本：

0.20.3
2.3.1

respectively.分别。 See this google colab example: https://colab.research.google.com/drive/1b5pqbp9TXfKiY0ucEIngvz6_Tc4mo_QX请参阅此 google colab 示例： https ://colab.research.google.com/drive/1b5pqbp9TXfKiY0ucEIngvz6_Tc4mo_QX

Keras 和 Scikit-learn 的加权精度度量之间的差异

问题描述

Intro介绍

Issue description问题描述

System Information系统信息

Code example代码示例

Results and summary结果和总结

1 个解决方案

解决方案1
0 2020-05-27 12:22:15

Keras 和 Scikit-learn 的加权精度度量之间的差异

问题描述

Intro介绍

Issue description问题描述

System Information系统信息

Code example代码示例

Results and summary结果和总结

1 个解决方案

解决方案1 0 2020-05-27 12:22:15

解决方案1
0 2020-05-27 12:22:15