简体   繁体   English

python LSTM数组索引过多

[英]python LSTM too many indices for array

The purpose of this code is to predict the future values of forex market for the general currencies. 该代码的目的是预测通用货币的外汇市场的未来价值。

At first, I made a unified data frame, this unified data frame is combination of 11 forex market data sets for the most widely traded currencies plus a group of 900 economic indicators. 首先,我创建了一个统一的数据框,该统一的数据框是11种最广泛交易货币的外汇市场数据集以及900个经济指标的组合。

After combining these 911 data sets in the unified data frame with no problems of any sort after being tested, I begun the LSTM neural network which I also tested with just single data set and it works great. 在将这些911数据集组合到统一数据帧中,经过测试后没有任何问题之后,我开始了LSTM神经网络,该网络也仅使用单个数据集进行了测试,并且效果很好。

The problem begins when I combined the unified data frame with the LSTM neural network. 当我将统一数据框架与LSTM神经网络结合在一起时,问题就开始了。

Here is the code: 这是代码:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM
from keras.models import Sequential
import os

os.chdir("E:\Business\Stocks")
path = os.listdir("E:\Business\Stocks")
for file in path:
    name, ext = os.path.splitext(str(file))
    column_names = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume']
    df1 = pd.read_csv(file, names=column_names, parse_dates={'DateTime': ['Date', 'Time']}, index_col=[0])
    df1 = df1.rename(columns={'Open': name + ' ' + 'Open', 'High': name + ' ' + 'High',
                              'Low': name + ' ' + 'Low', 'Close': name + ' ' + 'Close',
                              'Volume': name + ' ' + 'Volume'})


os.chdir("E:\Business\Economic Indicators")
path = os.listdir("E:\Business\Economic Indicators")
for file in path:
    df2 = pd.read_csv(file, index_col=[0], parse_dates=[0])
    name, ext1 = os.path.splitext(file)
    df2 = df2.rename(columns={'Actual': name + ' ' + 'Actual', 'Consensus': name + ' ' + 'Consensus',
                              'Previous': name + ' ' + 'Previous', 'Revised': name + ' ' + 'Revised'})


dfs = [df1 ,df2]
df = pd.concat(dfs, axis=1, join='inner').sort_index(ascending=False)
df.fillna(method='ffill', inplace=True)

sequence_length = 120
n_features = len(df.columns)
val_ratio = 0.1
n_epochs = 3000
batch_size = 500

data = df.as_matrix()
data_processed = []
for index in range(len(data) - sequence_length):
    data_processed.append(data[index: index + sequence_length])
data_processed = np.array(data_processed)

val_split = round((1 - val_ratio) * data_processed.shape[0])
train = data_processed[: int(val_split), :]
val = data_processed[int(val_split):, :]

print('Training data: {}'.format(train.shape))
print('Validation data: {}'.format(val.shape))

train_samples, train_nx, train_ny = train.shape
val_samples, val_nx, val_ny = val.shape

train = train.reshape((train_samples, train_nx * train_ny))
val = val.reshape((val_samples, val_nx * val_ny))

preprocessor = MinMaxScaler().fit(train)
train = preprocessor.transform(train)
val = preprocessor.transform(val)

train = train.reshape((train_samples, train_nx, train_ny))
val = val.reshape((val_samples, val_nx, val_ny))

X_train = train[:, : -1]
y_train = train[:, -1][:, -1]
X_val = val[:, : -1]
y_val = val[:, -1][:, -1]

X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], n_features))
X_val = np.reshape(X_val, (X_val.shape[0], X_val.shape[1], n_features))

model = Sequential()
model.add(LSTM(input_shape=(X_train.shape[1:]), units=100, return_sequences=True))
model.add(Dropout(0.5))
model.add(LSTM(100, return_sequences=False))
model.add(Dropout(0.25))
model.add(Dense(units=1))
model.add(Activation("relu"))

model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mse', 'accuracy'])

history = model.fit(
    X_train,
    y_train,
    batch_size=batch_size,
    epochs=n_epochs,
    verbose=2)

preds_val = model.predict(X_val)
diff = []
for i in range(len(y_val)):
    pred = preds_val[i][0]
    diff.append(y_val[i] - pred)

real_min = preprocessor.data_min_[104]
real_max = preprocessor.data_max_[104]
print(preprocessor.data_min_[:1])
print(preprocessor.data_max_[:1])

preds_real = preds_val * (real_max - real_min) + real_min
y_val_real = y_val * (real_max - real_min) + real_min

plt.plot(preds_real, label='Predictions')
plt.plot(y_val_real, label='Actual values')
plt.xlabel('test')
plt.legend(loc=0)
plt.show()
print(model.summary())

Here is the error: 这是错误:

Using TensorFlow backend. 使用TensorFlow后端

Traceback (most recent call last): 追溯(最近一次通话):

File "E:/Tutorial/new.py", line 47, in train = data_processed[: int(val_split), :] 文件“ E:/Tutorial/new.py”,第47行,位于train = data_processed [:int(val_split),:]中

IndexError: too many indices for array IndexError:数组索引过多

You didn't slice data_processed properly. 您没有正确切片data_processed I think that you have redundant space between int(val_split) and : . 我认为int(val_split):之间有多余的空间。 I'm not sure what were your intentions, but this answer should cover probable scenarios. 我不确定您的意图是什么,但是此答案应涵盖可能的情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM