繁体   English   中英

如何找到神经网络的假阳性率和假阴性率?

[英]How do I find the false positive and false negative rates for a neural network?

删除了文本,因为我还没有找到解决方案,所以我意识到我不希望其他人窃取有效的第一部分。

当您已经从scikit.learn加载了confusion_matrix时,可以使用以下代码:

cutoff = 0.5
y_predict = model.predict(x_test)                              
y_pred_classes = np.zeros_like(y_pred)    # initialise a matrix full with zeros
y_pred_classes[y_pred > cutoff] = 1

y_test_classes = np.zeros_like(y_pred)
y_test_classes[y_test > cutoff] = 1
print(confusion_matrix(y_test_classes, y_pred_classes)

混乱矩阵总是这样排列的:

True Positives    False negatives
False Positives   True negatives

对于tn等,您可以运行以下命令:

tn, fp, fn, tp = confusion_matrix(y_test_classes, y_pred_classes).ravel()
(tn, fp, fn, tp)

您对confusion_matrix的输入必须是一个整数数组,而不是一个热编码。

# Predicting the Test set results
y_pred = model.predict(X_test)
y_pred = (y_pred > 0.5)
matrix = metrics.confusion_matrix(y_test.argmax(axis=1), y_pred.argmax(axis=1))

低于输出将以这种方式出现,因此通过给出概率阈值.5会将其转换为二进制。

输出(y_pred):

[0.87812372 0.77490434 0.30319547 0.84999743]

sklearn.metrics.accuracy_score(y_true,y_pred)方法将y_pred定义为:

y_pred:类似于1d数组,或标签指示符数组/稀疏矩阵。 预测标签,由分类器返回。

这意味着y_pred必须为1或0的数组(谓词标签)。 他们不应该是概率。

错误的根本原因是理论上的而不是计算上的问题:您正在尝试在无意义的回归(即数值预测)模型(神经逻辑模型)中使用分类指标(准确性)。

就像大多数性能指标一样,准确性将苹果与苹果进行了比较(即,真实标签为0/1,而预测值再次为0/1); 因此,当您要求函数将二进制真实标签(苹果)与连续预测(橙色)进行比较时,会出现预期的错误,该错误消息从计算的角度确切地告诉您问题出在哪里:

Classification metrics can't handle a mix of binary and continuous target

尽管该消息并没有直接告诉您您正在尝试计算对您的问题无效的指标(并且我们实际上不应期望它走得那么远),但是scikit-learning无疑是一件好事至少会给您直接和明确的警告,表示您尝试做错事; 在其他框架上并不一定是这种情况-例如,在非常相似的情况下,看到Keras的行为,您根本不会得到任何警告,而最终只是在回归设置中抱怨“准确性”低下...

from keras import models
from keras.layers import Dense, Dropout
from keras.utils import to_categorical
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from keras.models import Sequential
from keras.layers import Dense, Activation
from sklearn.cross_validation import  train_test_split
from sklearn import metrics
from sklearn.cross_validation import KFold, cross_val_score
from sklearn.preprocessing import StandardScaler


# read the csv file and convert into arrays for the machine to process
df = pd.read_csv('dataset_ori.csv')
dataset = df.values

# split the dataset into input features and the feature to predict
X = dataset[:,0:7]
Y = dataset[:,7]

# Splitting into Train and Test Set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(dataset,
                                                    response,
                                                    test_size = 0.2,
                                                    random_state = 0)

# Initialising the ANN
classifier = Sequential()

# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 10, kernel_initializer = 'uniform', activation = 'relu', input_dim =7 ))
model.add(Dropout(0.5))
# Adding the second hidden layer
classifier.add(Dense(units = 10, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dropout(0.5))
# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))

# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])

# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size = 10, epochs = 20)

# Train model
scaler = StandardScaler()
classifier.fit(scaler.fit_transform(X_train.values), y_train)

# Summary of neural network
classifier.summary()

# Predicting the Test set results & Giving a threshold probability
y_prediction = classifier.predict_classes(scaler.transform(X_test.values))
print ("\n\naccuracy" , np.sum(y_prediction == y_test) / float(len(y_test)))
y_prediction = (y_prediction > 0.5)




## EXTRA: Confusion Matrix Visualize
from sklearn.metrics import confusion_matrix,accuracy_score
cm = confusion_matrix(y_test, y_pred) # rows = truth, cols = prediction
df_cm = pd.DataFrame(cm, index = (0, 1), columns = (0, 1))
plt.figure(figsize = (10,7))
sn.set(font_scale=1.4)
sn.heatmap(df_cm, annot=True, fmt='g')
print("Test Data Accuracy: %0.4f" % accuracy_score(y_test, y_pred))

#Let's see how our model performed
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM