Getting constant Prediction values using LSTM Keras syntax

Question

I am trying to predict the growth rate of a user using LSTM and Adam algo. But the predictions which I am getting from code is way far then accurate values. I am new to ML and just trying to learn how things are measured in ML. That what does units basically do in the LSTM model. I am reading values from CSV and trying to find the Growth rate of a user based on the amount he collected in 2 years . But my Predictions seem to be giving inaccurate values . Can anyone tell me how Can I find the correct prediction in order to get the Growth rate of a user?

Here my code :

    import pymysql  
import pandas as pd
import numpy as np
import csv
from datetime import datetime
import time
import json
import matplotlib.pyplot as plt
import seaborn as sns
import pprint
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.utils.np_utils import to_categorical
from keras.layers import Input
import os
os.environ['KERAS_BACKEND']='tensorflow'
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
from keras.layers.recurrent import LSTM
from matplotlib import style
from keras.layers import Activation, Dense, Dropout


    df = pd.read_csv("trakop.csv")
print("="*50)
print("First Five Rows ","\n")
print(df.head(2),"\n")

dataset = df
dataset["Month"] = pd.to_datetime(df["timestamp"]).dt.month
dataset["Year"] = pd.to_datetime(df["timestamp"]).dt.year
dataset["Date"] = pd.to_datetime(df["timestamp"]).dt.date
dataset["Time"] = pd.to_datetime(df["timestamp"]).dt.time
dataset["Week"] = pd.to_datetime(df["timestamp"]).dt.week
dataset["Day"] = pd.to_datetime(df["timestamp"]).dt.day_name()
dataset["Hour"] = pd.to_datetime(df["timestamp"]).dt.hour
dataset = df.set_index("timestamp")
dataset.index = pd.to_datetime(dataset.index)
dataset.head(1)

print(df.Year.unique(),"\n")
print("Total Number of Unique Year", df.Year.nunique(), "\n")


NewDataSet = dataset.resample('D').mean()
# print(NewDataSet)

print("Old Dataset ",dataset.shape )
print("New  Dataset ",NewDataSet.shape )
excludedValue = 5
TestData = NewDataSet.tail(10)
Training_Set = NewDataSet.iloc[:,0:1]
Training_Set = Training_Set[:-excludedValue]
print("Training Set Shape ", Training_Set.shape)
print("Test Set Shape ", TestData.shape)

Training_Set = Training_Set.values
sc = MinMaxScaler(feature_range=(0, 1))
Train = sc.fit_transform(Training_Set)

X_Train = []
Y_Train = []

# Range should be fromm 60 Values to END
for i in range(excludedValue, Train.shape[0]):

    # X_Train 0-9
    X_Train.append(Train[i- excludedValue:i])

    # Y Would be 10 th Value based on past 10 Values
    Y_Train.append(Train[i])

# Convert into Numpy Array
X_Train = np.array(X_Train)
Y_Train = np.array(Y_Train)

print(X_Train.shape)
print(Y_Train.shape)

X_Train = np.reshape(X_Train, newshape=(X_Train.shape[0], X_Train.shape[1], 1))
X_Train.shape

regressor = Sequential()

# Adding the first LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 1, return_sequences = True, input_shape = (X_Train.shape[1], 1)))
regressor.add(Dropout(0.4))

# Adding a second LSTM layer and some Dropout regularisation
regressor.add(LSTM(units=1, return_sequences = True))
regressor.add(Dropout(0.4))

# Adding a third LSTM layer and some Dropout regularisation
regressor.add(LSTM(units=1, return_sequences = True))
regressor.add(Dropout(0.4))

# Adding a fourth LSTM layer and some Dropout regularisation
regressor.add(LSTM(units = 1))
regressor.add(Dropout(0.4))

# Adding the output layer
regressor.add(Dense(units = 1))

# Compiling the RNN
regressor.compile(optimizer = 'rmsprop', loss = 'mean_squared_error', metrics=['acc'])
regressor.fit(X_Train, Y_Train, epochs = 30, batch_size = 12,verbose=2)

Df_Total = pd.concat((NewDataSet[["amount"]], TestData[["amount"]]), axis=0)
Df_Total.shape

inputs = Df_Total[len(Df_Total) - len(TestData) - excludedValue:].values

# We need to Reshape
inputs = inputs.reshape(-1,1)

# Normalize the Dataset
inputs = sc.transform(inputs)

X_test = []
for i in range(excludedValue, inputs.shape[0]):
    X_test.append(inputs[i- excludedValue:i])

# Convert into Numpy Array
X_test = np.array(X_test)

# Reshape before Passing to Network
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))

# Pass to Model
predicted_raise = regressor.predict(X_test)

# Do inverse Transformation to get Values
predicted_raise = sc.inverse_transform(predicted_raise)

Predicted_Amount  = predicted_raise
dates = TestData.index.to_list()

True_Amount = TestData["amount"].to_list()
Predicted_Amount  = predicted_raise
dates = TestData.index.to_list()
growth_rate= (True_Amount-Predicted_Amount)/True_Amount*100
Machine_Df = pd.DataFrame(data={
    "Date":dates,
    "TrueAmount": True_Amount,
    "PredictedAmount":[x[0] for x in Predicted_Amount ],
    "Growthrate": [x[0] for x in growth_rate]
})
print(Machine_Df)
fig = plt.figure()

ax1= fig.add_subplot(111)

x = dates
y = True_Amount

y1 = Predicted_Amount

plt.plot(x,y, color="green")
plt.plot(x,y1, color="red")
# beautify the x-labels
plt.gcf().autofmt_xdate()
plt.xlabel('Dates')
plt.ylabel("Amount")
plt.title("Machine Learned the Pattern Predicting Future Values ")
plt.legend()

Here is what I am getting in my output:

    ('First Five Rows ', '\n')
(             timestamp  amount
0  2019-09-08 06:30:23    38.0
1  2019-09-08 06:36:48    19.0, '\n')
(array([2019, 2020]), '\n')
('Total Number of Unique Year', 2, '\n')
('Old Dataset ', (12492, 8))
('New  Dataset ', (129, 5))
('Training Set Shape ', (124, 1))
('Test Set Shape ', (10, 5))
(119, 5, 1)
(119, 1)
Epoch 1/30
 - 15s - loss: 0.0177 - acc: 0.0084
Epoch 2/30
 - 1s - loss: 0.0165 - acc: 0.0084
Epoch 3/30
 - 1s - loss: 0.0153 - acc: 0.0084
Epoch 4/30
 - 1s - loss: 0.0167 - acc: 0.0084
Epoch 5/30
 - 1s - loss: 0.0157 - acc: 0.0084
Epoch 6/30
 - 1s - loss: 0.0158 - acc: 0.0084
Epoch 7/30
 - 1s - loss: 0.0151 - acc: 0.0084
Epoch 8/30
 - 1s - loss: 0.0153 - acc: 0.0084
Epoch 9/30
 - 1s - loss: 0.0150 - acc: 0.0084
Epoch 10/30
 - 1s - loss: 0.0160 - acc: 0.0084
Epoch 11/30
 - 1s - loss: 0.0158 - acc: 0.0084
Epoch 12/30
 - 1s - loss: 0.0155 - acc: 0.0084
Epoch 13/30
 - 1s - loss: 0.0157 - acc: 0.0084
Epoch 14/30
 - 1s - loss: 0.0155 - acc: 0.0084
Epoch 15/30
 - 1s - loss: 0.0152 - acc: 0.0084
Epoch 16/30
 - 1s - loss: 0.0153 - acc: 0.0084
Epoch 17/30
 - 1s - loss: 0.0150 - acc: 0.0084
Epoch 18/30
 - 1s - loss: 0.0151 - acc: 0.0084
Epoch 19/30
 - 1s - loss: 0.0150 - acc: 0.0084
Epoch 20/30
 - 1s - loss: 0.0151 - acc: 0.0084
Epoch 21/30
 - 1s - loss: 0.0153 - acc: 0.0084
Epoch 22/30
 - 1s - loss: 0.0151 - acc: 0.0084
Epoch 23/30
 - 1s - loss: 0.0150 - acc: 0.0084
Epoch 24/30
 - 1s - loss: 0.0153 - acc: 0.0084
Epoch 25/30
 - 1s - loss: 0.0152 - acc: 0.0084
Epoch 26/30
 - 1s - loss: 0.0151 - acc: 0.0084
Epoch 27/30
 - 1s - loss: 0.0152 - acc: 0.0084
Epoch 28/30
 - 1s - loss: 0.0151 - acc: 0.0084
Epoch 29/30
 - 1s - loss: 0.0151 - acc: 0.0084
Epoch 30/30
 - 1s - loss: 0.0151 - acc: 0.0084
        Date  Growthrate  PredictedAmount  TrueAmount
0 2020-01-05    1.695584       122.266731  124.375625
1 2020-01-06    1.691683       122.271584   98.166667
2 2020-01-07    1.682077       122.283531  120.892473
3 2020-01-08    1.690008       122.273666   84.863636
4 2020-01-09    1.694407       122.268196   94.673077
5 2020-01-10    1.706436       122.253235   99.140341
6 2020-01-11    1.700952       122.260056  124.580882
7 2020-01-12    1.701755       122.259056   56.390071
8 2020-01-13    1.696290       122.265854   78.746951
9 2020-01-14    1.698001       122.263725   49.423529

[100 rows x 3 columns]

Screenshot of Graph :  The CSV I am using: https://drive.google.com/file/d/1nKHNqh7fJJJVvb2Qy-DxAO7c7HwNpEI0/view?usp=sharing

Any help would be greatly appreciated!!!

Answer 1

I have worked on your code. First of all please reduce the batch size because the size of dataset is small and change the optimizer from " adam " to " rmsprop ". Because adam uses constant learning rate, that's why you are receiving the constant values in the prediction. I have also increased the dropout to 0.4. For calculating the growth rate, I have used the formula, growth rate= (True Amount- Predicted Amount)/True Amount *100 this formula, gives you the percentage score of growth in the predicted amount and true amount. For full code, please follow the GitHub link

https://github.com/rohitnarain24/Optimizing-LSTM-model/blob/master/optimized%20lstm.txt

Getting constant Prediction values using LSTM Keras syntax

Question

1 answers

solution1
1 2020-01-17 14:01:27

Getting constant Prediction values using LSTM Keras syntax

Question

1 answers

solution1 1 2020-01-17 14:01:27

solution1
1 2020-01-17 14:01:27