![](/img/trans.png)
[英]How can I implement the input of multiple regression in LSTM using keras?
[英]How can I create Multiple Input One Output LSTM Model with Keras?
我将使用 LSTM 创建一个多输入单输出模型。实际上我创建了它。但是我的模型有问题。它工作得非常糟糕。我按月使用酒店的每日房价数据集。我试图创建一个模型,根据这些信息估计总价格。但我创建的模型效果不佳。下面我分享了模型的代码和数据集的链接。
data = pd.read_csv("/content/drive/My Drive/hotels.csv")
data
new_data = data.loc[:,['date','days','price','total']]
new_data.info()
date = new_data.date.values
dates = []
for i in date:
dates.append(i.split('/')[0])
new_data['date'] = dates
new_data
new_data = new_data.astype('float32')
new_data.info()
进口泡菜
filehandler = open(b"Hotels.obj","wb")
pickle.dump(new_data,filehandler)
file = open("/content/Hotels.obj",'rb')
object_file = pickle.load(file)
object_file
from math import sqrt
from matplotlib import pyplot
from pandas import read_csv
from pandas import DataFrame
from pandas import concat
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Concatenate
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Add
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import RMSprop,Adam
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf
import datetime
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from packaging import version
print("TensorFlow version: ", tf.__version__)
assert version.parse(tf.__version__).release[0] >= 2, \
"This notebook requires TensorFlow 2.0 or above."
file = open('/content/Hotels.obj', 'rb')
scaler = MinMaxScaler(feature_range=(0, 1))
train_size = int(len(object_file) * 0.76)
test_size = len(object_file) - train_size
days = object_file["days"].values.reshape(-1,1)
price = object_file["price"].values.reshape(-1,1)
total = object_file["total"].values.reshape(-1,1)
date = object_file["date"].values.reshape(-1,1)
days_ = scaler.fit_transform(days)
total_ = scaler.fit_transform(total)
price_ = scaler.fit_transform(price)
date_ = scaler.fit_transform(date)
days_train = days_[0:train_size].reshape(train_size,1,1)
days_test = days_[train_size:len(days_)].reshape(test_size,1,1)
date_train = date_[0:train_size].reshape(train_size,1,1)
date_test = date_[train_size:len(days_)].reshape(test_size,1,1)
price_train = price_[0:train_size].reshape(train_size,1,1)
price_test = price_[train_size:len(price_)].reshape(test_size,1,1)
total_train = total_[0:train_size].reshape(train_size,1)
total_test = total_[train_size:len(total_)].reshape(test_size,1)
def buildModel(dataLength,labelLength):
date = tf.keras.Input(shape=(1,1),name='date')
days = tf.keras.Input(shape=(1,1),name='days')
price = tf.keras.Input(shape=(1,1),name='price')
dateLayers = LSTM(100,return_sequences=False)(date)
daysLayers = LSTM(100,return_sequences=False)(days)
priceLayers = LSTM(100,return_sequences=False)(price)
output = tf.keras.layers.concatenate(inputs=[dateLayers,daysLayers, priceLayers],axis=1)
output = Dense(labelLength,activation='relu',name='weightedAverage_output_3')(output)
model = Model(inputs=[date,days,price],outputs=[output])
optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999)
model.compile(optimizer=optimizer,loss='mse',metrics=['accuracy'])
return model
object_file = pickle.load(file)
logdir="logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=logdir)
rnn = buildModel(train_size,1)
rnn.fit([date_train,days_train,price_train],
[total_train],
validation_data = ([date_test,days_test,price_test],[total_test]),
epochs = 1,
batch_size = 10,
callbacks=[tensorboard_callback]
)
result = rnn.predict([date_test,days_test,price_test])
scaler.inverse_transform(result)
当我增加epoch数时,模型过度拟合。我无法得到我想要的结果。我该怎么做?
数据集链接: https : //www.kaggle.com/leomauro/argodatathon2019#hotels.csv
你的结果很差,因为你的指标是accuracy
。 如果我理解正确,那么您是在预测一个连续变量——您不是在分类。 所以,看准确性是没有意义的。
指标应该是mae
的平均绝对误差。 我想你会对你的模型表现感到满意。
在这里重新调整目标没有任何意义。 神经网络的内部工作原理更喜欢 0 到 1 之间的输入。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.