简体   繁体   English

预测 LSTM model 的股价结果显着高于预期

[英]Significantly higher outcomes in stock price predicting LSTM model than expected

I have made my LSTM model to estimate next day's stock prices.我已经制作了我的 LSTM model 来估计第二天的股价。 I have used tensorflow and keras.我用过 tensorflow 和 keras。

However, I do not understand why my model's predicted price is almost always 2 or 3 factors higher than the current stock price.但是,我不明白为什么我的模型的预测价格几乎总是比当前股价高 2 或 3 个因子。 Is there anybody who knows what I am doing wrong?有谁知道我做错了什么?

The code is shown below:代码如下所示:

def StockPredictor(stock, startdate, enddate, pricetype):
    
    #Get the stock quote
    df = web.DataReader(stock, data_source = 'yahoo', start=startdate, end=enddate)
    #df = pd.read_csv('StockData/TATA.csv')
    
    #Create a new dataframe with only the price type chosen
    data = df.filter([pricetype])
    dataset = data.values  #convert dataset into a numpy array
    training_data_len = math.ceil(len(dataset) * 0.80) #ik wil 80% van de dataset gebruiken om het LSTM model te trainen (naar boven afronden met math.ceil)
    
    #Scale the data (normalizing imput data) (helps the model)
    scaler = MinMaxScaler(feature_range=(0,1))  #scaled_data allemaal waardes tussen 0 en 1
    scaled_data = scaler.fit_transform(dataset)  #computes min and max values for scaling and transforms data based on these values
    
    #Create the training data set
    #Create the scaled training data set
    train_data = scaled_data[0:training_data_len , :]
    #split data into x_train and y_train datasets
    x_train, y_train = [], []  #x_train independent training feature, y dependent
    for i in range(60, len(train_data)):
        x_train.append(train_data[i-60:i,0])  #bevat de waardes van 60 vorige periodes 
        y_train.append(train_data[i, 0])    #bevat 61e waarde waarvan we willen dat model het voorspelt
    
    #Convert x_train and y_train to numpy arrays
    x_train, y_train = np.array(x_train), np.array(y_train)
    
    #reshape data (LSTM expects data to be 3D in form of no. of samples, no. of timestamps and no. of features) (x_train is now 2D)
    x_train = np.reshape(x_train, (x_train.shape[0],x_train.shape[1], 1)) #reshape tot 3D, x_train.shape[0] = no of rows in 2D x_train, [1] is no of colums van 2D x_train
    
    #Build the LSTM model
    model=Sequential()
    model.add(LSTM(50, return_sequences=True, input_shape=(x_train.shape[1], 1)))
    model.add(LSTM(50, return_sequences= False))
    model.add(Dense(25))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mean_squared_error') #model has loss function and optimizer
    
    #training the model with the fit function
    model.fit(x_train, y_train, batch_size=1, epochs=1) #epoch is no of iterations of the dataset forth and backwarth in neural network
    
    #Create the testing data set
    #Create new array containing scale valuels from index
    test_data = scaled_data[training_data_len - 60: , :]
    #create datasets x_test and y_test
    x_test = []
    y_test = dataset[training_data_len:, :]
    for i in range(60, len(test_data)):
        x_test.append(test_data[i-60:i,0])
        
    #convert data into numpy array
    x_test = np.array(x_test)
    
    #Reshape data (zelfde uitleg als regel 65)
    x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))
    
    #Get models predicted price values
    #predictions afhankelijk van x_test moeten zelfde values krijgen als y_test
    predictions = model.predict(x_test) #want predictions to contain same values as y_test
    predictions = scaler.inverse_transform(predictions) #unscale the values
    
    #Get the RMSE (om het model te testen)
    rmse = np.sqrt(np.mean(predictions - y_test)**2)
    rmse
    
    #Plot the data
    train = data[:training_data_len]
    valid = data[training_data_len:]
    valid['Predictions'] = predictions
        
    print('The RMSE for the training model =', rmse)
    
    new_df = df.filter([pricetype])
    #get the last 60 days
    last_60_days = new_df[-60:].values
    last_60_days_scaled = scaler.transform(last_60_days)
    #create empty list
    X_test = []
    #append past 60 days to list
    X_test.append(last_60_days)
    #Convert X_test to numpy array
    X_test = np.array(X_test)
    #reshape to 3D
    X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1],1))
    #Get predicted scaled price
    pred_price = model.predict(X_test)
    #undo scaling
    pred_price = scaler.inverse_transform(pred_price)
    print('Predicted price for the next day is :',pred_price)
    
    return pred_price 

allprices = []
for i in range(10):
    pred_price = StockPredictor()
    allprices.append(pred_price)
    
average_pred_price = sum(allprices) / len(allprices)

You are using a min max scaler between 0 and 1 , where the defined highs are historical.您正在使用介于01之间的最小最大缩放器,其中定义的高点是历史记录。 Your LSTM model will predict a new high and when you inverse_transform the prediction, it will likely be higher than the min and max that has been fitted to the scaler.您的 LSTM inverse_transform将预测一个新的高点,当您对预测进行逆转换时,它可能会高于已安装到缩放器的最小值和最大值。

Therefore the scaler is the likely culprit that is resulting in your predictions being a factor of 2x higher.因此,缩放器可能是导致您的预测高出 2 倍的罪魁祸首。 Using a standard scaler might help, or without scaling at all.使用标准缩放器可能会有所帮助,或者根本不需要缩放。

Side Note边注

LSTMs to predict stock prices simply on price data will not work.仅根据价格数据预测股票价格的 LSTM 是行不通的。

What is likely to happen is that your LSTM model will predict prices with a T+1 lag - predicting the price with a 1 day lag.可能发生的情况是您的 LSTM model 将以 T+1 滞后预测价格 - 以 1 天滞后预测价格。

Price data inherently contains noise, brought about by retail traders and especially now with social sentiment trading.价格数据固有地包含由散户交易者带来的噪音,尤其是现在社会情绪交易。 An LSTM is likely to overfit on historical noise and therefore is unrepresentative of future "noises" LSTM 可能会过度拟合历史噪声,因此不能代表未来的“噪声”

For more information on the problem of noise, check out this link - https://www.investopedia.com/articles/trading/06/marketnoise.asp有关噪音问题的更多信息,请查看此链接 - https://www.investopedia.com/articles/trading/06/marketnoise.asp

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 python 和机器学习 (LSTM) 预测未来“x”天的股价 - Predicting stock price 'x' days into the future using python & machine learning (LSTM) keras多层LSTM模型的股价预测收敛于恒定值 - Stock price predictions of keras multilayer LSTM model converge to a constant value 如何计算一只股票价格高于另一只股票的天数 - How to count the number of days a stock price is higher than the other 预测股票价格时的 LSTM 形状误差 - LSTM shape error when predicting stock prices 从经过训练的 LSTM 模型进行预测 - Predicting from a trained LSTM model 使用回归数据模型预测价格 - Predicting price using regression data model LSTM模型不会预测高于特定值的值(始终不是同一值) - LSTM model doesn't predict values higher than certain value (not same value all the time) 使用LSTM ptb模型张量流示例预测下一个单词 - Predicting the next word using the LSTM ptb model tensorflow example 使用 LSTM 的多变量 Keras 预测模型:预测时使用哪个索引? - Multivariate Keras Prediction Model With LSTM: Which index is used when predicting? 使用 model 进行预测是否比 python 应用程序中的训练和预测更消耗 CPU? - Is predicting with model is more CPU consuming than training and predicting in python app?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM