[英]Keras LSTM Input/Output Dimension
I am constructing an LSTM predictor with Keras. 我正在用Keras构建LSTM预测器。 My input array is historical price data. 我的输入数组是历史价格数据。 I segment the data into window_size
blocks, in order to predict prediction length
blocks ahead. 我将数据分割为window_size
块,以便预测前面的prediction length
块。 My data is a list of 4246 floating point numbers. 我的数据是4246个浮点数的列表。 I seperate my data into 4055 arrays each of length 168 in order to predict 24 units ahead. 我将数据分成4055个数组,每个数组的长度为168,以便预测前面的24个单位。
This gives me an x_train
set with dimension (4055,168)
. 这给我一个x_train
集合,其维度为(4055,168)
。 I then scale my data and try to fit the data but run into a dimension error. 然后,我缩放数据并尝试拟合数据,但遇到尺寸错误。
df = pd.DataFrame(data)
print(f"Len of df: {len(df)}")
min_max_scaler = MinMaxScaler()
H = 24
window_size = 7*H
num_pred_blocks = len(df)-window_size-H+1
x_train = []
y_train = []
for i in range(num_pred_blocks):
x_train_block = df['C'][i:(i + window_size)]
x_train.append(x_train_block)
y_train_block = df['C'][(i + window_size):(i + window_size + H)]
y_train.append(y_train_block)
LEN = int(len(x_train)*window_size)
x_train = min_max_scaler.fit_transform(x_train)
batch_size = 1
def build_model():
model = Sequential()
model.add(LSTM(input_shape=(window_size,batch_size),
return_sequences=True,
units=num_pred_blocks))
model.add(TimeDistributed(Dense(H)))
model.add(Activation("linear"))
model.compile(loss="mse", optimizer="rmsprop")
return model
num_epochs = epochs
model= build_model()
model.fit(x_train, y_train, batch_size = batch_size, epochs = 50)
The error being returned is as such. 这样返回的错误就是这样。
ValueError: Error when checking model target: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 4055 arrays: [array([[0.00630006],
Am I not segmenting correctly? 我无法正确分割吗? Loading correctly? 加载正确吗? Should the number of units be different than the number of prediction blocks? 单元数应该与预测块数不同吗? I appreciate any help. 感谢您的帮助。 Thanks. 谢谢。
The suggestions to convert them to Numpy arrays is correct but MinMixScalar() returns a numpy array. 将它们转换为Numpy数组的建议是正确的, 但 MinMixScalar()返回一个numpy数组。 I reshaped the arrays into the proper dimension but now my computer is having CUDA memory error. 我将阵列重塑为适当的尺寸, 但是现在我的计算机出现CUDA内存错误。 I consider the problem solved. 我认为问题已经解决。 Thank you. 谢谢。
df = pd.DataFrame(data)
min_max_scaler = MinMaxScaler()
H = prediction_length
window_size = 7*H
num_pred_blocks = len(df)-window_size-H+1
x_train = []
y_train = []
for i in range(num_pred_blocks):
x_train_block = df['C'][i:(i + window_size)].values
x_train.append(x_train_block)
y_train_block = df['C'][(i + window_size):(i + window_size + H)].values
y_train.append(y_train_block)
x_train = min_max_scaler.fit_transform(x_train)
y_train = min_max_scaler.fit_transform(y_train)
x_train = np.reshape(x_train, (len(x_train), 1, window_size))
y_train = np.reshape(y_train, (len(y_train), 1, H))
batch_size = 1
def build_model():
model = Sequential()
model.add(LSTM(batch_input_shape=(batch_size, 1, window_size),
return_sequences=True,
units=100))
model.add(TimeDistributed(Dense(H)))
model.add(Activation("linear"))
model.compile(loss="mse", optimizer="rmsprop")
return model
num_epochs = epochs
model = build_model()
model.fit(x_train, y_train, batch_size = batch_size, epochs = 50)
I don't think you passed the batch size in the model. 我认为您没有在模型中通过批次大小。
input_shape=(window_size,batch_size)
is the data dimension. input_shape=(window_size,batch_size)
是数据维度。 which is correct, but you should use input_shape=(window_size, 1)
正确,但是您应该使用input_shape=(window_size, 1)
If you want to use batch, you have to add another dimension, like this LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]))
(Cited from the Keras) 如果要使用批处理,则必须添加另一个尺寸,例如LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]))
(从LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]))
引用)
in your case: 在您的情况下:
def build_model():
model = Sequential()
model.add(LSTM(input_shape=(batch_size, 1, window_size),
return_sequences=True,
units=num_pred_blocks))
model.add(TimeDistributed(Dense(H)))
model.add(Activation("linear"))
model.compile(loss="mse", optimizer="rmsprop")
return model
You also need to use np.shape
to change the dimension of the of your data, it should be ( batch_dim
, data_dim_1
, data_dim_2
). 您还需要使用np.shape
更改数据的尺寸,它应该是( batch_dim
, data_dim_1
, data_dim_2
)。 I use numpy
, so numpy.reshape()
will work. 我使用numpy
,所以numpy.reshape()
可以工作。
First your data should be row-wise, so for each row, you should have a shape of (1, 168)
, then add the batch dimension, it will be (batch_n, 1, 168)
. 首先,您的数据应按行排列,因此对于每一行,您都应具有(batch_n, 1, 168)
(1, 168)
的形状,然后添加批处理维度,它将是(batch_n, 1, 168)
。
Hope this help. 希望对您有所帮助。
That's probably because x_train
and y_train
were not updated to numpy arrays. 这可能是因为x_train
和y_train
没有更新为numpy数组。 Take a closer look at this issue on github. 在github上仔细看看这个问题 。
model = build_model()
x_train, y_train = np.array(x_train), np.array(y_train)
model.fit(x_train, y_train, batch_size = batch_size, epochs = 50)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.