[英]How do I create a timeseries sliding window tensorflow dataset where some features have different batch sizes than others?
Currently I am able to create a timeseries sliding window batched dataset that contains ordered 'feature sets' like 'inputs', 'targets', 'benchmarks', etc. Originally I had developed my model and dataset wherein the targets would be of the same batch size as all other inputs, however that has proven to be detrimental to tuning the input batch size and also won't be helpful when it comes time to run this on live data where I only care to produce a single sample output of the shape (1, horizon, targets)
or perhaps just (horizon, targets)
given an input dataset of (samples, horizon, features)
.目前我能够创建一个时间序列滑动 window 批处理数据集,其中包含有序的“特征集”,如“输入”、“目标”、“基准”等。最初我开发了我的 model 和数据集,其中目标是相同的与所有其他输入一样的批量大小,然而,这已被证明对调整输入批量大小有害,并且在需要对实时数据运行它时也无济于事,我只关心生成形状的单个样本 output
(1, horizon, targets)
或者可能只是(horizon, targets)
给定输入数据集(samples, horizon, features)
。
As an overview, I want to take N
historical samples of horizon
length features
at time T
, run them through the model and output a single sample of horizon
length targets
;作为一个概述,我想在时间
T
获取N
个horizon
长度features
的历史样本,通过 model 和 output 单个horizon
长度targets
样本运行它们; repeat until the dataset is run through in its entirety.重复直到整个数据集运行完毕。
Assuming a pandas DataFrame of length Z
, all resulting Datasets should have a length of Z - horizon
.假设长度为
Z
的 pandas DataFrame,所有生成的数据集的长度应为Z - horizon
。 The 'targets' Dataset should have a batch size of 1, and the 'inputs' Dataset should have a batch size of batch_size
. “目标”数据集的批量大小应为 1,“输入”数据集的批量大小应为
batch_size
。
Here's a stripped down snippet of what I currently use in order to generate a standard batch size for all feature sets:这是我目前用来为所有功能集生成标准批量大小的精简片段:
import tensorflow as tf
import pandas as pd
horizon = 5
batch_size = 10
columns = {
"inputs": ["input_1", "input_2"],
"targets": ["target_1"],
}
batch_options = {
"drop_remainder": True,
"deterministic": True,
}
d = range(100)
df = pd.DataFrame(data={'input_1': d, 'input_2': d, 'target_1': d})
slices = tuple(df[x].astype("float32") for x in columns.values())
data = (
tf.data.Dataset.from_tensor_slices(slices)
.window(horizon, shift=1, drop_remainder=True)
.flat_map(
lambda *c: tf.data.Dataset.zip(
tuple(
col.batch(horizon, **batch_options)
for col in c
)
)
)
.batch(
batch_size,
**batch_options,
)
)
We can create two sliding windowed dataset and zip them.我们可以创建两个滑动窗口数据集和 zip 它们。
inputs = df[['input_1', 'input_1']].to_numpy()
labels = df['target_1'].to_numpy()
window_size = 10
stride =1
data1 = tf.data.Dataset.from_tensor_slices(inputs).window(window_size, shift=stride, drop_remainder=True).flat_map(lambda x: x.batch(window_size))
data2 = tf.data.Dataset.from_tensor_slices(inputs).window(1, shift=stride, drop_remainder=True).flat_map(lambda x: x.batch(1))
data = tf.data.Dataset.zip((data1, data2))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.