简体   繁体   English

训练LSTM模型

[英]Training a LSTM model

I'm trying to train my lstm model but getting 0 as accuracy, precision, recall and f1 score. 我正在尝试训练我的lstm模型,但准确度,准确性,召回率和f1得分却为0。 I downloaded the heart disease dataset from kaggle . 我从kaggle下载了心脏病数据集。 Here's the code: 这是代码:

import tensorflow as tf
import pandas as pd
import numpy as np
from tensorflow.contrib import rnn
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, accuracy_score, recall_score, precision_score

heartt = pd.read_csv('heart.csv')

cols_to_norm = ['sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal']

heartt[cols_to_norm] = heartt[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))

sex_people = tf.feature_column.numeric_column('sex')
c_p = tf.feature_column.numeric_column('cp')
trest_bps = tf.feature_column.numeric_column('trestbps')
cholestrol = tf.feature_column.numeric_column('chol')
fb_s= tf.feature_column.numeric_column('fbs')
rest_ecg = tf.feature_column.numeric_column('restecg')
thala_ch = tf.feature_column.numeric_column('thalach')
ex_ang = tf.feature_column.numeric_column('exang')
old_peak = tf.feature_column.numeric_column('oldpeak')
slo_pe = tf.feature_column.numeric_column('slope')
c_a = tf.feature_column.numeric_column('ca')
tha_l = tf.feature_column.numeric_column('thal')
ag_e = tf.feature_column.numeric_column('age')

age_buckets = tf.feature_column.bucketized_column(ag_e, boundaries=[20,30,40,50,60,70,80])

feat_cols = [sex_people ,c_p, trest_bps ,cholestrol ,fb_s,rest_ecg,thala_ch ,ex_ang, old_peak, slo_pe,c_a, tha_l, age_buckets]

x_data = heartt.drop('target',axis=1)

x_data.info()

labels = heartt['target']

X_train,X_test,y_train,y_test = train_test_split(x_data, labels, test_size=0.2, shuffle=False, random_state=42)

epochs = 8
n_classes = 1
n_units = 200
n_features = 13
batch_size = 35

xplaceholder= tf.placeholder('float',[None,n_features])
yplaceholder = tf.placeholder('float')

def recurrent_neural_network_model():
    layer ={ 'weights': tf.Variable(tf.random_normal([n_units, n_classes])),'bias': tf.Variable(tf.random_normal([n_classes]))}

    x = tf.split(xplaceholder, n_features, 1)
    print(x)

    lstm_cell = rnn.BasicLSTMCell(n_units)

    outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)

    output = tf.matmul(outputs[-1], layer['weights']) + layer['bias']

    return output

def train_neural_network():
    logit = recurrent_neural_network_model()
    logit = tf.reshape(logit, [-1])

    cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logit, labels=yplaceholder))
    optimizer = tf.train.AdamOptimizer().minimize(cost)

    with tf.Session() as sess:

        tf.global_variables_initializer().run()
        tf.local_variables_initializer().run()

        for epoch in range(epochs):
            epoch_loss = 0

            i = 0
            for i in range(int(len(X_train) / batch_size)):

                start = i
                end = i + batch_size

                batch_x = np.array(X_train[start:end])
                batch_y = np.array(y_train[start:end])

                _, c = sess.run([optimizer, cost], feed_dict={xplaceholder: batch_x, yplaceholder: batch_y})
                epoch_loss += c
                i += batch_size

            print('Epoch', epoch, 'completed out of', epochs, 'loss:', epoch_loss)

        pred = tf.round(tf.nn.sigmoid(logit)).eval({xplaceholder: np.array(X_test), yplaceholder: np.array(y_test)})
        f1 = f1_score(np.array(y_test), pred, average='macro')
        accuracy=accuracy_score(np.array(y_test), pred)
        recall = recall_score(y_true=np.array(y_test), y_pred= pred)
        precision = precision_score(y_true=np.array(y_test), y_pred=pred)
        print("F1 Score:", f1)
        print("Accuracy Score:",accuracy)
        print("Recall:", recall)
        print("Precision:", precision)


train_neural_network()

This is the output that I get: 这是我得到的输出:

[<tf.Tensor 'split:0' shape=(?, 1) dtype=float32>, <tf.Tensor 'split:1' shape=(?, 1) dtype=float32>, <tf.Tensor 'split:2' shape=(?, 1) dtype=float32>, <tf.Tensor 'split:3' shape=(?, 1) dtype=float32>, <tf.Tensor 'split:4' shape=(?, 1) dtype=float32>, <tf.Tensor 'split:5' shape=(?, 1) dtype=float32>, <tf.Tensor 'split:6' shape=(?, 1) dtype=float32>, <tf.Tensor 'split:7' shape=(?, 1) dtype=float32>, <tf.Tensor 'split:8' shape=(?, 1) dtype=float32>, <tf.Tensor 'split:9' shape=(?, 1) dtype=float32>, <tf.Tensor 'split:10' shape=(?, 1) dtype=float32>, <tf.Tensor 'split:11' shape=(?, 1) dtype=float32>, <tf.Tensor 'split:12' shape=(?, 1) dtype=float32>]
WARNING:tensorflow:From <ipython-input-15-5bc4f8465e4c>:8: BasicLSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is deprecated, please use tf.nn.rnn_cell.LSTMCell, which supports all the feature this cell currently has. Please replace the existing code with tf.nn.rnn_cell.LSTMCell(name='basic_lstm_cell').
Epoch 0 completed out of 8 loss: 1.00952459083328
Epoch 1 completed out of 8 loss: 3.3137323707244093e-06
Epoch 2 completed out of 8 loss: 1.6476146610239217e-09
Epoch 3 completed out of 8 loss: 2.08133817797794e-11
Epoch 4 completed out of 8 loss: 1.8306998712108724e-12
Epoch 5 completed out of 8 loss: 4.752560310897734e-13
Epoch 6 completed out of 8 loss: 2.238625324474169e-13
Epoch 7 completed out of 8 loss: 1.4679879558579696e-13
F1 Score: 0.0
Accuracy Score: 0.0
Recall: 0.0
Precision: 0.0
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\metrics\classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\metrics\classification.py:1137: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true samples.
  'recall', 'true', average, warn_for)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\metrics\classification.py:1137: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 due to no true samples.
  'recall', 'true', average, warn_for)

I have confusion as to where I might be wrong here. 我对这里可能错在哪里感到困惑。 Why am I getting not proper accuracy, precision, f1 score and recall? 为什么我没有正确的准确性,准确性,f1得分和召回率?

I'm taking a look at the dataset, it doesn't look like a problem for LSTM models. 我正在看数据集,对于LSTM模型来说这似乎不是问题。 LSTMs (and all RNNs in general) are meant to predict continuous outputs - they are the Neural Network-equivalent of time series regression. LSTM(以及所有RNN)通常用于预测连续输出-它们相当于时间序列回归的神经网络。 I know there are cases (such as sentiment analysis using NLP) in which you can apply LSTMs to classification problems, but that doesn't seem to be the case. 我知道在某些情况下(例如,使用NLP进行情感分析),您可以将LSTM应用于分类问题,但事实并非如此。 These data are "atemporal", ie each row of the dataset represents a patient, and the sequence of the data doesn't carry any information. 这些数据是“暂时的”,即数据集的每一行代表一个患者,并且数据序列不携带任何信息。

LSTMs are used when you need a model with "memory" of previous states of the data such as a time series. 当您需要具有“记忆”数据的先前状态(例如时间序列)的模型时,将使用LSTM。 If you want to apply LSTMs I suggest you to change dataset (you can take a look at this huge list of ML datasets ). 如果要应用LSTM,建议您更改数据集(您可以查看此庞大的ML数据集列表 )。 Otherwise, if that is the dataset of your interest, to switch to a feed forward Neural Network for classification purpose. 否则,如果这是您感兴趣的数据集,请切换到前馈神经网络以进行分类。 For this, you can check my personal TensorFlow tutorial on how to do it. 为此,您可以查看我个人的TensorFlow教程 ,了解如何进行操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM