简体   繁体   中英

Choose layers on Keras neural network

I am trying to build a neural network that can detect fraudulent transactions. We are using this dataset from Kaggle . I am a beginner to neural networks and am trying to find my way around how to define the model in the best way. Currently the model is not able to detect any frauds at all and all predictions are very close to 0. Including my code in the end. My questions are:

  1. How should I choose the layers to optimize performance?

  2. How should I compile the model and choose parameters such as "epoch" for optimal performance?

     from tensorflow.keras.layers import Dense, BatchNormalization, Dropout, Conv1D, Activation, Flatten import tensorflow as tf model = Sequential([ Dense(256, activation='relu', input_shape=(X_train.shape[1],)), BatchNormalization(), Dropout(0.3), Dense(256, activation='relu'), BatchNormalization(), Dropout(0.3), Dense(256, activation='relu'), BatchNormalization(), Dropout(0.3), Dense(1, activation='sigmoid'), ])

I've implemented a code with nearly 100% accuracy and avoided overfitting for the same, please compare and see where changes have been made, especially during the model creation.

Kaggle Link: https://www.kaggle.com/gautamchettiar/credit-card-fraud

data = pd.read_csv("../input/creditcardfraud/creditcard.csv")
input_features = data.loc[:, data.columns != 'Class']
labels = data['Class']

Then I check up on the split of the classes, which is really uneven in this particular case, yet anyway.

from collections import Counter
Counter(data['Class'])
Counter({0: 284315, 1: 492})

Now that all the data is ready, time to create an appropriate train_test_split for verifying later on.

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(input_features, labels, test_size=0.2)

A few imports before creating the model.

from tensorflow.keras import Sequential, layers
import tensorflow as tf

And now, I've not made much changes in my code, however I assume its the Batch Normalization causing issues at your end (just my opinion, I may be wrong). Another thing you might want to check up on is how you've compiled your model.

model = Sequential(
    [
        layers.Dense(100, activation="relu", input_shape=(x_train.shape[-1],)),
        layers.Dropout(0.1),
        layers.Dense(100, activation="relu"),
        layers.Dropout(0.1),
        layers.Dense(50, activation="relu"),
        layers.Dropout(0.1),
        layers.Dense(50, activation="relu"),
        layers.Dropout(0.1),
        layers.Dense(1, activation="sigmoid"),
    ]
)

Then I chose Adam, becuase... its pretty good ig?

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=[tf.keras.metrics.BinaryAccuracy(),
                       tf.keras.metrics.FalseNegatives()])

The model training begins here then.

model.fit(x_train, y_train)
7121/7121 [==============================] - 24s 3ms/step - loss: 0.9938 - binary_accuracy: 0.9970 - false_negatives_9: 398.0000
<keras.callbacks.History at 0x7ff131da4090>

Then I tested the same on the test set, to understand was the score because of overfitting or not.

scores = model.evaluate(x_test, y_test)

print(f"Accuracy on test set: {scores[1]}")
print(f"False Negatives on test set: {scores[2]}")

And for this, the final output is as shown below.

Accuracy on test set: 0.9983673095703125
False Negatives on test set: 93.0

Hope this helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM