简体   繁体   English

Keras LSTM-使用来自发生器的Tensorflow数据集API的进料序列数据

[英]Keras LSTM - feed sequence data with Tensorflow dataset API from the generator

I am trying to solve how I can feed data to my LSTM model for training. 我正在尝试解决如何将数据馈送到LSTM模型进行培训的问题。 (I will simplify the problem in my example below.) I have the following data format in csv files in my dataset. (我将在下面的示例中简化该问题。)我在数据集中的csv文件中具有以下数据格式。

Timestep    Feature1    Feature2    Feature3    Feature4    Output
1           1           2           3           4           a
2           5           6           7           8           b
3           9           10          11          12          c 
4           13          14          15          16          d
5           17          18          19          20          e
6           21          22          23          24          f
7           25          26          27          28          g
8           29          30          31          32          h
9           33          34          35          36          i
10          37          38          39          40          j

The task is to estimate the Output of any future timestep based on the data from last 3 timesteps. 任务是根据最近3个时间步长中的数据估计任何将来时间步长的输出。 Some input-output exapmles are as following: 某些输入输出示例如下:

Example 1: Input: 示例1:输入:

Timestep    Feature1    Feature2    Feature3    Feature4    
1           1           2           3           4           
2           5           6           7           8           
3           9           10          11          12           

Output: c 输出: c

Example 2: Input: 示例2:输入:

Timestep    Feature1    Feature2    Feature3    Feature4    
2           5           6           7           8           
3           9           10          11          12           
4           13          14          15          16          

Output: d 输出: d

Example 3: Input: 示例3:输入:

Timestep    Feature1    Feature2    Feature3    Feature4   
3           9           10          11          12          
4           13          14          15          16         
5           17          18          19          20         

Output: e 输出: e

And when feeding the data to the model, I would like to shuffle the data in a way so that I do not feed consecutive sequences when training. 当将数据输入模型时,我希望以某种方式对数据进行混洗,以便在训练时不输入连续的序列。 With other words, I ideally would like to feed the data sequences like timesteps 3,4,5 in one step, maybe timesteps 5,6,7 in the next step, and maybe 2,3,4 in the following step, and so on.. And I preferably do not want to feed the data as 1,2,3 first, then 2,3,4 , then 3,4,5 , and so on... 换句话说,理想情况下,我希望一步一步地输入数据序列,例如时间步长3,4,5在下一步中可能输入时间步长5,6,7 ,在下一步中可能输入2,3,4 ,等等我最好不要先将数据作为1,2,3 ,然后将2,3,4 ,然后3,4,5等等,依次类推。

When training my LSTM network, I am using Keras with Tensorflow backend. 训练LSTM网络时,我将Keras与Tensorflow后端一起使用。 I would like to use a generator when feeding my data to the fit_generator(...) function. 当我将数据输入到fit_generator(...)函数时,我想使用一个生成器。

My desire is to use Tensorflow's dataset API to fetch the data from csv files. 我的愿望是使用Tensorflow的数据集API从csv文件中获取数据。 But I could not figure out how to make the generator return what I need. 但是我不知道如何使生成器返回我需要的东西。 If I shuffle the data with Tensorflow's dataset API, it will destroy the order of the timesteps. 如果我使用Tensorflow的数据集API随机整理数据,它将破坏时间步长的顺序。 The generator should also return batches that include multiple sequence examples. 生成器还应返回包含多个序列示例的批次。 For instance, if the batch size is 2, then it may need to return 2 sequences like timesteps 2,3,4 and timesteps 6,7,8. 例如,如果批次大小为2,则可能需要返回2个序列,例如时间步长2、3、4和时间步长6、7、8。

Hoping that I could explain my problem... Is it possible to use Tensorflow's dataset API in a generator function for such a sequence problem so that I can feed batches of sequences as I explained above? 希望我能解释我的问题...是否可以在生成器函数中使用Tensorflow的数据集API来解决此类序列问题,以便我可以按上述方法提供批处理序列? (The generator needs to return data with the shape [batch_size, length_of_each_sequence, nr_inputs_in_each_timestep] , where length_of_each_sequence=3 and nr_of_inputs_in_each_timestep=4 in my example.) Or is the best way to do this to write a generator in Python only, maybe by using Pandas..? (生成器需要返回形状为[batch_size, length_of_each_sequence, nr_inputs_in_each_timestep] ,其中,在我的示例中, nr_of_inputs_in_each_timestep=4 length_of_each_sequence=3nr_of_inputs_in_each_timestep=4 )或者这是最好的方法,可以仅使用Python编写生成器熊猫..?

ADDENDUM 1: 附件1:

I have done the following experiment after seeing the answer from @kvish. 在看到@kvish的答案后,我做了以下实验。

import tensorflow as tf
import numpy as np
from tensorflow.contrib.data.python.ops import sliding

sequence = np.array([ [[1]], [[2]], [[3]], [[4]], [[5]], [[6]], [[7]], [[8]], [[9]] ])
labels = [1,0,1,0,1,0,1,0,1]

# create TensorFlow Dataset object
data = tf.data.Dataset.from_tensor_slices((sequence, labels))

# sliding window batch
window_size = 3
window_shift = 1
data = data.apply(sliding.sliding_window_batch(window_size=window_size, window_shift=window_shift))
data = data.shuffle(1000, reshuffle_each_iteration=False)
data = data.batch(3)

#iter = dataset.make_initializable_iterator()
iter = tf.data.Iterator.from_structure(data.output_types, data.output_shapes)
el = iter.get_next()

# create initialization ops 
init_op = iter.make_initializer(data)

NR_EPOCHS = 2
with tf.Session() as sess:
    for e in range (NR_EPOCHS):
      print("\nepoch: ", e, "\n")
      sess.run(init_op)
      print("1  ", sess.run(el))
      print("2  ", sess.run(el))
      print("3  ", sess.run(el))

And here is the output: 这是输出:

epoch:  0 

1   (array([[[[6]],[[7]],[[8]]],  [[[1]],[[2]],[[3]]],  [[[2]],[[3]],[[4]]]]), 
     array([[0, 1, 0],  [1, 0, 1],  [0, 1, 0]], dtype=int32))

2   (array([[[[7]],[[8]],[[9]]],  [[[3]],[[4]],[[5]]],  [[[4]],[[5]],[[6]]]]), 
     array([[1, 0, 1],  [1, 0, 1],  [0, 1, 0]], dtype=int32))

3   (array([[[[5]],[[6]],[[7]]]]), array([[1, 0, 1]], dtype=int32))

epoch:  1 

1   (array([[[[2]],[[3]],[[4]]],  [[[7]],[[8]],[[9]]],  [[[1]],[[2]],[[3]]]]), 
     array([[0, 1, 0],  [1, 0, 1],  [1, 0, 1]], dtype=int32))

2   (array([[[[5]],[[6]],[[7]]],  [[[3]],[[4]],[[5]]],  [[[4]],[[5]],[[6]]]]), 
     array([[1, 0, 1],  [1, 0, 1],  [0, 1, 0]], dtype=int32))

3   (array([[[[6]],[[7]],[[8]]]]), 
     array([[0, 1, 0]], dtype=int32))

I could not try it on csv file reading yet but I think that this approach should be working quite fine! 我无法在csv文件读取上尝试使用它,但是我认为这种方法应该可以正常工作!

But as I see it, the reshuffle_each_iteration parameter is making no difference. 但是正如我所看到的, reshuffle_each_iteration参数没有什么区别。 Is this really needed? 这真的需要吗? Results are not necessarily identical when it is set to True or False . 将结果设置为TrueFalse时,结果不一定相同。 What is this reshuffle_each_iteration parameter supposed to do here? reshuffle_each_iteration参数在这里应该做什么?

I think this answer might be close to what you are looking for! 我认为这个答案可能与您要寻找的很接近!

You create batches by sliding over windows, and then shuffle the input in your case. 您可以通过在窗口上滑动来创建批处理,然后根据情况改组输入。 The shuffle function of the dataset api has a reshuffle_after_each_iteration parameter, which you might probably want to set to False if you want to experiment with setting a random seed and looking at the order of shuffled outputs. 数据集api的shuffle函数具有reshuffle_after_each_iteration参数,如果您想尝试设置随机种子并查看随机输出的顺序,则可能需要将其设置为False。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 TensorFlow 2 中的生成器转换存储在 CSV 或 parquet 文件中的数据以提供 Keras lstm 模型 - Transforming the data stored in a CSV or parquet file to feed a Keras lstm model using a generator in TensorFlow 2 带有 Tensorflow 数据集 API 的 Keras 生成器 - IndexError:从空列表中弹出 - Keras Generator with Tensorflow Dataset API - IndexError: pop from empty list 来自生成器 OutOfRangeError 的 Tensorflow 数据集:序列结束 - Tensorflow dataset from generator OutOfRangeError: End of sequence 从 tf.keras.utils.Sequence 构建的自定义数据生成器不适用于 tensorflow 模型适合 api - Custom data generator build from tf.keras.utils.Sequence doesn't work with tensorflow model's fit api Keras LSTM + TensorFlow和一个数字序列(改善损耗) - Keras LSTM + TensorFlow and a number sequence (improve loss) 具有高序列长度的 LSTM(Tensorflow 和 Keras) - LSTM with high sequence length (Tensorflow and Keras) 带有序列损失的 Tensorflow Keras LSTM - Tensorflow Keras LSTM with sequence_loss 来自生成器的张量流数据集 - tensorflow dataset from generator 带有 TimeSeriesGenerator 的 Keras LSTM 自定义数据生成器 - Custom Data Generator for Keras LSTM with TimeSeriesGenerator Keras / Tensorflow:使用tf.data.Dataset API预测 - Keras / Tensorflow: Predict Using tf.data.Dataset API
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM