简体   繁体   中英

How to improve neural network through dropout layers?

I am working on a neural network that predicts heart disease. The data comes from kaggle and has been pre-processed. I have used various models, such as logistic regression, random forests, and SVM, which all produce solid results. I'm trying to use the same data for a neural network, to see whether a NN can outperform the other ML models (the data set is rather small, which may explain the poor results). Below is my code for the network. The model below produces 50% accuracy, which, obviously, is too low to be useful. From what you can tell, does anything look off that would undermine the accuracy of the model?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from tensorflow.keras.layers import Dense, Dropout
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.callbacks import EarlyStopping

df = pd.read_csv(r"C:\Users\***\Desktop\heart.csv")

X = df[['age','sex','cp','trestbps','chol','fbs','restecg','thalach']].values
y = df['target'].values

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

scaler.fit_transform(X_train)
scaler.transform(X_test)


nn = tf.keras.Sequential()

nn.add(Dense(30, activation='relu'))

nn.add(Dropout(0.2))

nn.add(Dense(15, activation='relu'))

nn.add(Dropout(0.2))


nn.add(Dense(1, activation='sigmoid'))


nn.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics= 
 ['accuracy'])


early_stop = EarlyStopping(monitor='val_loss',mode='min', verbose=1, 
patience=25)

nn.fit(X_train, y_train, epochs = 1000, validation_data=(X_test, y_test),
     callbacks=[early_stop])

model_loss = pd.DataFrame(nn.history.history)
model_loss.plot()

predictions = nn.predict_classes(X_test)

from sklearn.metrics import classification_report,confusion_matrix

print(classification_report(y_test,predictions))
print(confusion_matrix(y_test,predictions))

After running your model using EarlyStopping,

Epoch 324/1000
23/23 [==============================] - 0s 3ms/step - loss: 0.5051 - accuracy: 0.7364 - val_loss: 0.4402 - val_accuracy: 0.8182
Epoch 325/1000
23/23 [==============================] - 0s 3ms/step - loss: 0.4716 - accuracy: 0.7643 - val_loss: 0.4366 - val_accuracy: 0.7922
Epoch 00325: early stopping
WARNING:tensorflow:From <ipython-input-54-2ee8517852a8>:54: Sequential.predict_classes (from tensorflow.python.keras.engine.sequential) is deprecated and will be removed after 2021-01-01.
Instructions for updating:
Please use instead:* `np.argmax(model.predict(x), axis=-1)`,   if your model does multi-class classification   (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype("int32")`,   if your model does binary classification   (e.g. if it uses a `sigmoid` last-layer activation).
              precision    recall  f1-score   support

           0       0.90      0.66      0.76       154
           1       0.73      0.93      0.82       154

    accuracy                           0.79       308
   macro avg       0.82      0.79      0.79       308
weighted avg       0.82      0.79      0.79       308

It suggests a reasonable accuracy and f1-score with such a simple MLP.

在此处输入图像描述

I used this dataset: https://www.kaggle.com/abdulhakimrony/heartcsv/data

  1. Train for all the epochs, the initial accuracy may be low but the model will soon converge after few epochs.

  2. Use seed in random, tensorflow and numpy to get reproducible result each time.

  3. If simple models show good accuracy, the chances are NN will outperform, but you have to make sure the NN is not overfitted.

  4. Check if your data is imbalanced or not, if yes, try using class_weights .

  5. You can try tuner with cross-validation to get the best performing model.

The scaler is not in-place; you need to save the scaled results.

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

You'll then get results more in line with what you were expecting.

              precision    recall  f1-score   support

           0       0.93      0.98      0.95       144
           1       0.98      0.93      0.96       164

    accuracy                           0.95       308
   macro avg       0.95      0.96      0.95       308
weighted avg       0.96      0.95      0.95       308

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM