简体   繁体   English

用Keras预测因变量对整个数组显示相同的结果,并且混淆矩阵抛出错误

[英]Prediction of dependent variable with Keras shows the same result for the entire array & Confusion Matrix throws error

I am building a ANN with Keras. 我正在与Keras建立ANN。 The shape of my df is (120000,18). 我的df的形状是(120000,18)。 My goal is to predict based on 17 independent variables(X's) what my dependent variable(Y) will be. 我的目标是根据17个独立变量(X)预测我的因变量(Y)。 I have 2 questions which I added below. 我有以下两个问题。 Here is my code: 这是我的代码:

Creating ANN 创建人工神经网络

Question 1: How can all my values for y_pred_train for my Training set be the same value? 问题1:我的训练集y_pred_train的所有值如何都可以是相同的值? Also, the predictions should show a binary result, meaning 0 or 1, meaning if prediction will be true vs false. 同样,预测应显示二进制结果,表示0或1,表示预测是对是假。 Why am I getting 0.41542563? 为什么我得到0.41542563?

Data Preprocessing
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

#ANN
import keras
from keras.models import Sequential
from keras.layers import Dense

# Initialising the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'relu', input_dim = 17))
# Adding the second hidden layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'relu'))
# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)

Snippet of epoch at point of convergence: 7210/80030 [=>............................] - ETA: 8s - loss: 0.6822 - acc: 0.7046 收敛点的纪元片段:7210/80030 [=> .....................................]-ETA:8s-损失: 0.6822-帐户:0.7046

# Classifying the Train set results
y_pred_train = classifier.predict(X_train)
y_pred_train

Out[50]: 
array([[0.41542563],
       [0.41542563],
       [0.41542563],
       ...,
       [0.41542563],
       [0.41542563],
       [0.41542563]], dtype=float32)

Creating the Confusion Matrix 创建混淆矩阵

Question 2: When I try to execute a Confusion matrix, I get an error. 问题2:当我尝试执行混淆矩阵时,出现错误。

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

ValueError-Traceback (most recent call last)
<command-207679> in <module>()
      1 # Making the Confusion Matrix
      2 from sklearn.metrics import confusion_matrix
----> 3 cm = confusion_matrix(y_test, y_pred)

/databricks/python/lib/python3.5/site-packages/sklearn/metrics/classification.py in confusion_matrix(y_true, y_pred, labels, sample_weight)
    238 
    239     """
--> 240     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    241     if y_type not in ("binary", "multiclass"):
    242         raise ValueError("%s is not supported" % y_type)

/databricks/python/lib/python3.5/site-packages/sklearn/metrics/classification.py in _check_targets(y_true, y_pred)
     70     y_pred : array or indicator matrix
     71     """
---> 72     check_consistent_length(y_true, y_pred)
     73     type_true = type_of_target(y_true)
     74     type_pred = type_of_target(y_pred)

/databricks/python/lib/python3.5/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
    179     if len(uniques) > 1:
    180         raise ValueError("Found input variables with inconsistent numbers of"
--> 181                          " samples: %r" % [int(l) for l in lengths])
    182 
    183 

ValueError: Found input variables with inconsistent numbers of samples: [34299, 22866]

Your dense layers only have 1 unit each. 您的密集层每个只有1个单位。 Except the first layer, rest of the layers are just passing the values along. 除第一层外,其余各层都将传递值。 Hence, all the outputs are same. 因此,所有输出都是相同的。

As for your second question, the activation function on the last layer is sigmoid, which gives an output in the range of 0-1 . 至于第二个问题,最后一层的激活函数为S型,其输出范围为0-1 You will have to convert them based on a threshold. 您将必须根据阈值对其进行转换。 For example, 例如,

y_pred = [1 if x>0.5 else 0 for x in y_pred] , in which 0.5 is the threshold. y_pred = [1 if x>0.5 else 0 for x in y_pred] ,其中0.5是阈值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM