[英]How do I find the false positive and false negative rates for a neural network?
删除了文本,因为我还没有找到解决方案,所以我意识到我不希望其他人窃取有效的第一部分。
As you already loaded the confusion_matrix
from scikit.learn
, you can use this one: 当您已经从
scikit.learn
加载了confusion_matrix
时,可以使用以下代码:
cutoff = 0.5
y_predict = model.predict(x_test)
y_pred_classes = np.zeros_like(y_pred) # initialise a matrix full with zeros
y_pred_classes[y_pred > cutoff] = 1
y_test_classes = np.zeros_like(y_pred)
y_test_classes[y_test > cutoff] = 1
print(confusion_matrix(y_test_classes, y_pred_classes)
the confusion matrix always is ordered like this: 混乱矩阵总是这样排列的:
True Positives False negatives
False Positives True negatives
for tn and so on you can run this: 对于tn等,您可以运行以下命令:
tn, fp, fn, tp = confusion_matrix(y_test_classes, y_pred_classes).ravel()
(tn, fp, fn, tp)
Your input to confusion_matrix must be an array of int not one hot encodings. 您对confusion_matrix的输入必须是一个整数数组,而不是一个热编码。
# Predicting the Test set results
y_pred = model.predict(X_test)
y_pred = (y_pred > 0.5)
matrix = metrics.confusion_matrix(y_test.argmax(axis=1), y_pred.argmax(axis=1))
Below output would have come in that manner so by giving a probability threshold .5 will transform this to Binary. 低于输出将以这种方式出现,因此通过给出概率阈值.5会将其转换为二进制。
output(y_pred):
输出(y_pred):
[0.87812372 0.77490434 0.30319547 0.84999743]
The sklearn.metrics.accuracy_score(y_true, y_pred) method defines y_pred as: sklearn.metrics.accuracy_score(y_true,y_pred)方法将y_pred定义为:
y_pred : 1d array-like, or label indicator array / sparse matrix. y_pred:类似于1d数组,或标签指示符数组/稀疏矩阵。 Predicted labels, as returned by a classifier.
预测标签,由分类器返回。
Which means y_pred has to be an array of 1's or 0's (predicated labels). 这意味着y_pred必须为1或0的数组(谓词标签)。 They should not be probabilities.
他们不应该是概率。
the root cause of your error is a theoretical and not computational issue: you are trying to use a classification metric (accuracy) in a regression (ie numeric prediction) model (Neural Logistic Model), which is meaningless. 错误的根本原因是理论上的而不是计算上的问题:您正在尝试在无意义的回归(即数值预测)模型(神经逻辑模型)中使用分类指标(准确性)。
Just like the majority of performance metrics, accuracy compares apples to apples (ie true labels of 0/1 with predictions again of 0/1); 就像大多数性能指标一样,准确性将苹果与苹果进行了比较(即,真实标签为0/1,而预测值再次为0/1); so, when you ask the function to compare binary true labels (apples) with continuous predictions (oranges), you get an expected error, where the message tells you exactly what the problem is from a computational point of view:
因此,当您要求函数将二进制真实标签(苹果)与连续预测(橙色)进行比较时,会出现预期的错误,该错误消息从计算的角度确切地告诉您问题出在哪里:
Classification metrics can't handle a mix of binary and continuous target
Despite that the message doesn't tell you directly that you are trying to compute a metric that is invalid for your problem (and we shouldn't actually expect it to go that far), it is certainly a good thing that scikit-learn at least gives you a direct and explicit warning that you are attempting something wrong; 尽管该消息并没有直接告诉您您正在尝试计算对您的问题无效的指标(并且我们实际上不应期望它走得那么远),但是scikit-learning无疑是一件好事至少会给您直接和明确的警告,表示您尝试做错事; this is not necessarily the case with other frameworks - see for example the behavior of Keras in a very similar situation, where you get no warning at all, and one just ends up complaining for low "accuracy" in a regression setting...
在其他框架上并不一定是这种情况-例如,在非常相似的情况下,看到Keras的行为,您根本不会得到任何警告,而最终只是在回归设置中抱怨“准确性”低下...
from keras import models
from keras.layers import Dense, Dropout
from keras.utils import to_categorical
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from keras.models import Sequential
from keras.layers import Dense, Activation
from sklearn.cross_validation import train_test_split
from sklearn import metrics
from sklearn.cross_validation import KFold, cross_val_score
from sklearn.preprocessing import StandardScaler
# read the csv file and convert into arrays for the machine to process
df = pd.read_csv('dataset_ori.csv')
dataset = df.values
# split the dataset into input features and the feature to predict
X = dataset[:,0:7]
Y = dataset[:,7]
# Splitting into Train and Test Set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(dataset,
response,
test_size = 0.2,
random_state = 0)
# Initialising the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 10, kernel_initializer = 'uniform', activation = 'relu', input_dim =7 ))
model.add(Dropout(0.5))
# Adding the second hidden layer
classifier.add(Dense(units = 10, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dropout(0.5))
# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])
# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size = 10, epochs = 20)
# Train model
scaler = StandardScaler()
classifier.fit(scaler.fit_transform(X_train.values), y_train)
# Summary of neural network
classifier.summary()
# Predicting the Test set results & Giving a threshold probability
y_prediction = classifier.predict_classes(scaler.transform(X_test.values))
print ("\n\naccuracy" , np.sum(y_prediction == y_test) / float(len(y_test)))
y_prediction = (y_prediction > 0.5)
## EXTRA: Confusion Matrix Visualize
from sklearn.metrics import confusion_matrix,accuracy_score
cm = confusion_matrix(y_test, y_pred) # rows = truth, cols = prediction
df_cm = pd.DataFrame(cm, index = (0, 1), columns = (0, 1))
plt.figure(figsize = (10,7))
sn.set(font_scale=1.4)
sn.heatmap(df_cm, annot=True, fmt='g')
print("Test Data Accuracy: %0.4f" % accuracy_score(y_test, y_pred))
#Let's see how our model performed
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.