[英]Significantly higher outcomes in stock price predicting LSTM model than expected
I have made my LSTM model to estimate next day's stock prices.我已经制作了我的 LSTM model 来估计第二天的股价。 I have used tensorflow and keras.我用过 tensorflow 和 keras。
However, I do not understand why my model's predicted price is almost always 2 or 3 factors higher than the current stock price.但是,我不明白为什么我的模型的预测价格几乎总是比当前股价高 2 或 3 个因子。 Is there anybody who knows what I am doing wrong?有谁知道我做错了什么?
The code is shown below:代码如下所示:
def StockPredictor(stock, startdate, enddate, pricetype):
#Get the stock quote
df = web.DataReader(stock, data_source = 'yahoo', start=startdate, end=enddate)
#df = pd.read_csv('StockData/TATA.csv')
#Create a new dataframe with only the price type chosen
data = df.filter([pricetype])
dataset = data.values #convert dataset into a numpy array
training_data_len = math.ceil(len(dataset) * 0.80) #ik wil 80% van de dataset gebruiken om het LSTM model te trainen (naar boven afronden met math.ceil)
#Scale the data (normalizing imput data) (helps the model)
scaler = MinMaxScaler(feature_range=(0,1)) #scaled_data allemaal waardes tussen 0 en 1
scaled_data = scaler.fit_transform(dataset) #computes min and max values for scaling and transforms data based on these values
#Create the training data set
#Create the scaled training data set
train_data = scaled_data[0:training_data_len , :]
#split data into x_train and y_train datasets
x_train, y_train = [], [] #x_train independent training feature, y dependent
for i in range(60, len(train_data)):
x_train.append(train_data[i-60:i,0]) #bevat de waardes van 60 vorige periodes
y_train.append(train_data[i, 0]) #bevat 61e waarde waarvan we willen dat model het voorspelt
#Convert x_train and y_train to numpy arrays
x_train, y_train = np.array(x_train), np.array(y_train)
#reshape data (LSTM expects data to be 3D in form of no. of samples, no. of timestamps and no. of features) (x_train is now 2D)
x_train = np.reshape(x_train, (x_train.shape[0],x_train.shape[1], 1)) #reshape tot 3D, x_train.shape[0] = no of rows in 2D x_train, [1] is no of colums van 2D x_train
#Build the LSTM model
model=Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error') #model has loss function and optimizer
#training the model with the fit function
model.fit(x_train, y_train, batch_size=1, epochs=1) #epoch is no of iterations of the dataset forth and backwarth in neural network
#Create the testing data set
#Create new array containing scale valuels from index
test_data = scaled_data[training_data_len - 60: , :]
#create datasets x_test and y_test
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
x_test.append(test_data[i-60:i,0])
#convert data into numpy array
x_test = np.array(x_test)
#Reshape data (zelfde uitleg als regel 65)
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))
#Get models predicted price values
#predictions afhankelijk van x_test moeten zelfde values krijgen als y_test
predictions = model.predict(x_test) #want predictions to contain same values as y_test
predictions = scaler.inverse_transform(predictions) #unscale the values
#Get the RMSE (om het model te testen)
rmse = np.sqrt(np.mean(predictions - y_test)**2)
rmse
#Plot the data
train = data[:training_data_len]
valid = data[training_data_len:]
valid['Predictions'] = predictions
print('The RMSE for the training model =', rmse)
new_df = df.filter([pricetype])
#get the last 60 days
last_60_days = new_df[-60:].values
last_60_days_scaled = scaler.transform(last_60_days)
#create empty list
X_test = []
#append past 60 days to list
X_test.append(last_60_days)
#Convert X_test to numpy array
X_test = np.array(X_test)
#reshape to 3D
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1],1))
#Get predicted scaled price
pred_price = model.predict(X_test)
#undo scaling
pred_price = scaler.inverse_transform(pred_price)
print('Predicted price for the next day is :',pred_price)
return pred_price
allprices = []
for i in range(10):
pred_price = StockPredictor()
allprices.append(pred_price)
average_pred_price = sum(allprices) / len(allprices)
You are using a min max scaler between 0
and 1
, where the defined highs are historical.您正在使用介于0
和1
之间的最小最大缩放器,其中定义的高点是历史记录。 Your LSTM model will predict a new high and when you inverse_transform
the prediction, it will likely be higher than the min and max that has been fitted to the scaler.您的 LSTM inverse_transform
将预测一个新的高点,当您对预测进行逆转换时,它可能会高于已安装到缩放器的最小值和最大值。
Therefore the scaler is the likely culprit that is resulting in your predictions being a factor of 2x higher.因此,缩放器可能是导致您的预测高出 2 倍的罪魁祸首。 Using a standard scaler might help, or without scaling at all.使用标准缩放器可能会有所帮助,或者根本不需要缩放。
LSTMs to predict stock prices simply on price data will not work.仅根据价格数据预测股票价格的 LSTM 是行不通的。
What is likely to happen is that your LSTM model will predict prices with a T+1 lag - predicting the price with a 1 day lag.可能发生的情况是您的 LSTM model 将以 T+1 滞后预测价格 - 以 1 天滞后预测价格。
Price data inherently contains noise, brought about by retail traders and especially now with social sentiment trading.价格数据固有地包含由散户交易者带来的噪音,尤其是现在社会情绪交易。 An LSTM is likely to overfit on historical noise and therefore is unrepresentative of future "noises" LSTM 可能会过度拟合历史噪声,因此不能代表未来的“噪声”
For more information on the problem of noise, check out this link - https://www.investopedia.com/articles/trading/06/marketnoise.asp有关噪音问题的更多信息,请查看此链接 - https://www.investopedia.com/articles/trading/06/marketnoise.asp
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.