简体   繁体   English

seq-to-seq LSTM 的不相关结果

[英]Irrelevant results of seq-to-seq LSTM

I am trying to predict a sequence of integers based on the input numbers.我正在尝试根据输入数字预测整数序列。

The input consists of values with 10 integers:输入由 10 个整数的值组成:

 array([[2021001001], [2021001002],..., ,[2021335249]],dtype=int64)

The output is the following, an array containing 7 integers. output如下,一个包含7个整数的数组。

 array([[23, 26, 17, ..., 21, 16, 4], [13, 24, 2, ..., 27, 10, 28], ..., [ 5, 16, 28, ..., 12, 27, 26]], dtype=int64)

This means that sequence number (input) [2021001001] will return the following sequence (output) [23, 26, 17, ..., 21, 16, 4].这意味着序列号(输入)[2021001001] 将返回以下序列(输出)[23, 26, 17, ..., 21, 16, 4]。

I tried training an LSTM on these inputs and outputs to predict what the following sequence will be based on a sequence number.我尝试在这些输入和输出上训练 LSTM,以根据序列号预测接下来的序列是什么。 I'm using about +60K of historical data to do this.我正在使用大约 +60K 的历史数据来做到这一点。 So far here's what I did:到目前为止,这是我所做的:

 model = tf.keras.Sequential() model.add(layers.LSTM(256, activation='relu', input_shape=(10, 1), recurrent_dropout=0.2)) model.add(layers.Dense(7)) model.compile(optimizer=tf.keras.optimizers.Adam(0.00001), loss=tf.keras.losses.MeanSquaredError(), metrics=['accuracy']) model.fit(inputs, output, epochs=10, verbose=1, validation_split=0.2, batch_size=256)

When testing the model after fitting we get weird results like the following:在拟合后测试 model 时,我们得到如下奇怪的结果:

 predictNextNumber = model.predict(tests_[0], verbose=1) print(predictNextNumber) 1/1 [==============================] - 0s 253ms/step [[[14.475913][14.757163][14.874351][14.702476][14.639976][14.624351][14.655601]]] While the expected output should be an array of integers [24, 12, 3, 5, 11, 8, 4].

I'm having trouble figuring out what the problem is.我很难弄清楚问题是什么。 Keras complained a lot about the shapes at first but when it was handled I kept receiving bad results. Keras 起初对形状抱怨了很多,但在处理时我一直收到不好的结果。 Any help would be appreciated.任何帮助,将不胜感激。

The description of your problem is a bit vague.你的问题的描述有点模糊。 It would be useful to get some actual data st we can try this on our own.获取一些实际数据会很有用,我们可以自己尝试一下。 It's also unclear what this data represents so we can't tell you if what you're doing even has a chance of success.目前还不清楚这些数据代表什么,所以我们无法告诉你你正在做的事情是否有成功的机会。 It's not clear whether the x and predict the y.目前尚不清楚是否 x 和预测 y。

However, it is very likely that the inputs and outputs are too big for your network.但是,很可能输入和输出对于您的网络来说太大了。 Networks (usually) work better with numbers in [-1, 1] so what you should probably do is use something like a StandardScaler .网络(通常)使用 [-1, 1] 中的数字效果更好,所以你应该做的是使用类似StandardScaler的东西。 You don't have to install sklearn for this.您不必为此安装 sklearn。 You can just compute the mean and standard deviation of your data and scale everything according to您可以计算数据的平均值和标准差,并根据

x_scaled = (x - m) / d

and

x = x_scaled * d + m 

for the inverse operation given m is the mean and d the standard deviation of your data x .对于给定的逆运算, m是数据x的平均值, d是标准差。

Since your inputs and outputs appear to come from different distributions, you'd have to do this two times.由于您的输入和输出似乎来自不同的分布,因此您必须这样做两次。

Assuming you use sklearn's StandardScaler , you'd do something like this:假设您使用 sklearn 的StandardScaler ,您将执行以下操作:

x_scaler = StandardScaler().fit(x_train)
y_scaler = StandardScaler().fit(y_train)
scalers = dict(x=x_scaler, y=y_scaler)

# Use scaler.transform(x) 
train_data = get_dataset(scalers, mode="train")
valid_data = get_dataset(scalers, mode="dev")
test_data = get_dataset(scalers, mode="test")

model.fit(train_data, validation_data=valid_data)

# Look at some test data by using `scaler.inverse_tranfform(data)

df = pd.DataFrame([], columns=["target", "prediction"])
for x, y in test_data:
    y_pred = model(x)
    y_pred = y_scaler.inverse_transform(y_pred)
    data = np.concatenate([y, y_pred], axis=-1)
    df = pd.concat([df, pd.DataFrame(data, columns=["target", "prediction"])])

df.target = df.target.astype(int)
df.prediction = df.prediction.round(2)
print(df)

The input numbers are very big, so add a normalization layer:输入的数字很大,所以添加一个归一化层:

normalization_layer = tf.keras.layers.Normalization()
normalization_layer.adapt(inputs)

model = tf.keras.Sequential()
model.add(Input(shape=(10, 1)))
model.add(normalization_layer)
model.add(layers.LSTM(256, activation='relu', recurrent_dropout=0.2))
...

You might need to train for many more epochs.您可能需要训练更多的时期。

The learning_rate of the optimizer seems a little bit low, maybe try the default values first.优化器的learning_rate好像有点低,可以先试试默认值。

Since you are predicting continous values, your metric should not be accuracy , but mse or mae or similar.由于您要预测连续值,因此您的指标不应该是accuracy ,而是msemae或类似的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM